File Indexing In Golang

File Indexing In Golang

I have been working on a pet project to write a File Indexer, which is a utility that helps me to search a directory for a given word or phrase.

The motivation behind to build this utility was so that we could search the chat log files for dgplug. We have a lot of online classes and guest session and at time we just remember the name or a phrase used in the class, backtracking the files using these are not possible as of now. I thought I will give stab at this problem and since I am trying to learn golang I implemented my solution in it. I implemented this solution over a span of two weeks where I spent time to upskill on certain aspects and also to come up with a clean solution.

Exploration

This started with exploring a similar solution because why not? It is always better to improve an existing solution than to write your own. I didn’t find any which suits our need so I ended up writing my own. The exploration to find a solution led me to discover few of the libraries that can be useful to us. I discovered fulltext¬†and Bleve.

I found bleve to have better documentation and really beautiful thought behind it. They have a very minimal yet effective thought process with which they designed the library. At the end of it I was sure I am going to use it and there is no going back.

Working On the Solution

After all the exploration I tried to break the problem I have into smaller problems and then to follow and solve each one of them. So first one was to understand how bleve works, I found out that bleve creates an index first for which we need to give it the list of files. The way the index is formed is basically a map structure behind the back where you give the id and content to be indexed. So what could be a unique constraint for a file in a filesystem? The path of the file I used it as the id to my structure and the content of my file as the value.

After figuring this out I wrote a function which takes the directory as the argument and gives back the path of each file and the content of each file. After few iteration of improvement it diverged into two functions one is responsible to get the path of all the files and the other just reads the file and get the content out.

func fileNameContentMap() []FileIndexer {
	var ROOTPATH = config.RootDirectory
	var files []string
	var filesIndex FileIndexer
	var fileIndexer []FileIndexer

	err := filepath.Walk(ROOTPATH, func(path string, info os.FileInfo, err error) error {
		if !info.IsDir() {
			files = append(files, path)
		}
		return nil
	})
	checkerr(err)
	for _, filename := range files {
		content := getContent(filename)
		filesIndex = FileIndexer{Filename: filename, FileContent: content}
		fileIndexer = append(fileIndexer, filesIndex)
	}
	return fileIndexer
}

This forms a struct which stores the name of the file and the content of the file. And since I can have many files I need to have a array of the struct. This is how the transition of moving from a simple data structure evolves into complex one.

Now I have the utility of getting all files, getting content of the file and making an index.

This forms a crucial step of what we are going to achieve next.

How Do I Search?

Now since I am able to do the part which prepares my data the next logical stem was to retrieve the searched results. The way we search something is by passing a query so I duck-typed a function which accepts a string and then went on a spree of documentation to find out how do I search in bleve, I found a simple implementation which returns me the id of the file which is the path and match score.


 func searchResults(indexFilename string, searchWord string) *bleve.SearchResult {
	index, _ := bleve.Open(indexFilename)
	defer index.Close()
	query := bleve.NewQueryStringQuery(searchWord)
	searchRequest := bleve.NewSearchRequest(query)
	searchResult, _ := index.Search(searchRequest)
	return searchResult
}

This function opens the index and search for the term and returns back the information.

Let’s Serve It

After all that is done I need to have a service which does this on demand so I wrote a simple API server which has two endpoints index and search.  The way mux works is you give the enpoint to the handler and which function has to be mapped with it. I had to restructure the code in order to make this work. I faced a very crazy bug which when I narrowed down came to a point of a memory leak and yes it was because I left the file read stream open so remember when you Open always defer Close.

I used Postman to heavily test it and it war returning me good responses. A dummy response looks like this:

 [{"index":"irclogs.bleve","id":"logs/some/hey.txt","score":0.6912244671221862,"sort":["_score"]}]

Missing Parts?

The missing part was I didn’t use any dependency manager which Kushal pointed out to me so I landed up using dep¬†to do this for me. The next one was the best problem and that is how do auto-index¬†a file, which suppose my service is running and I added one more file to the directory, this files content wouldn’t come up in the search because the indexer¬†has not run on it. This was a beautiful problem I tried to approach it from many different angles first I thought I would re-run the service every time I add a file but that’s not a graceful solution then I thought I would write a cron which will ping /index¬†at regular interval and yet again that was a bad option, finally I thought if I could detect the change in file. This led me to explore gin, modd and fresh.

Gin was not very compatible with mux so didn’t use it, modd was very nice but I need to kill the server to restart it since two service cannot run on a single port and every time I kill that service I kill the modd daemon too so that possibility also got ruled out.

Finally the best solution was fresh although I had to write a custom config file to suite the requirement this still has issues with nested repository indexing which I am thinking how to figure out.

What’s Next?

This project is yet to be containerised and there are missing test cases so I would be working on them as and when I get time.

I have learnt a lot of new things about filesystem and how it works because of this project, this helped me appreciate a lot of golang concepts and made me realise the power of static typing.

If you are interested you are welcome to contribute to file-indexer. Feel free to ping me.

Till then, Happy Hacking!

 

Advertisements

Writing Chuck – Joke As A Service

Writing Chuck – Joke As A Service

Recently I really got interested to learn Go, and to be honest I found it to be a beautiful language. I personally feel that it has that performance boost factor from a static language background and easy prototype and get things done philosophy from dynamic language background.

The real inspiration to learn Go was these amazing number of tools written and the ease with which these tools perform although they seem to be quite heavy. One of the good examples is Docker. So I thought I would write some utility for fun, I have been using fortune, this is a Linux utility which gives random quotes from a database. I thought let me write something similar but let me do something with jokes, keeping this mind I was actually searching for what can I do and I landed up on jokes about Chuck Norris or as we say it facts about him. I landed up on chucknorris.io they have an API which can return different jokes about Chuck, and there it was my opportunity to put something up and I chose Go for it.

JSON PARSING

The initial version of the utility which I put together was way simple, it use to make a GET request stream the data in put in the given format and display the joke. But even with this implementation I learnt a lot of things, the most prominent one was how a variable is exported in Go i.e how can it be made available across scope and how to parse a JSON from a received response to store the beneficial information in a variable.

Now the mistake I was doing with the above code is I was declaring the fields of the struct with a small letters this caused a problem because although the value get stored in the struct¬†I can’t use them outside the function I have declared it in. I actually took a while to figure it out and it was really nice to actually learn about this. I actually learnt about how to make a GET¬†request and parse¬†the JSON and use the given values.

Let’s walk through the code, the initial part is a struct¬†and I have few fields inside it, the Category field is a slice¬†of string, which can have as many elements as it receives the interesting part is the way you can specify the key¬†from the received JSON how the value of received JSON is stored in the variable or the field of the struct. You can see the json:"categories"¬†that is the way to do it.

With the rest of the code if you see I am making a GET request to the given URL and if the it returns a response it will be res and if it returns an error it will be handled by err. The key part here is how marshaling and unmarshaling of JSON takes place.

This is basically folding and un-folding JSON once that is done and the values are stored to retrieve the value we just use a dot notation and done. There is one more interesting part if you see we passed &joke which if you have a C background you will realize is passing the memory address, pass by reference, is what you are looking at.

This was working good and I was quite happy with it but there were two problems I faced:

  1. The response use to take a while to return the jokes
  2. It doesn’t work without internet

So I showed it to Sayan and he suggested why not to build a joke caching mechanism this would solve both the problems since jokes will be stored internally on the file system it will take less time to fetch and there is no dependency on the internet except the time you are caching jokes.

So I designed the utility in a way that you can cache as may number of jokes as you want you just have to run chuck --index=10 this will cache 10 jokes for you and will store it in a Database. Then from those jokes a random joke is selected and is shown to you.

I learnt to use flag¬†in go and also how to integrate a sqlite3¬†database in the utility, the best learning was handling files, so my logic was anytime you are caching you should have a fresh set of jokes so when you cache I completely delete the database and create a new one for the user. To do this I need to check of the Database is already existing and if it is then remove it. I landed up looking for the answer on how to do that in Go, there are a bunch of inbuilt APIs which help you to do that but they were misleading for me. There is os.Stat, os.IsExist¬†and os.IsNotExist. What I understood is os.Stat¬†will give me the status of the file, while the other two can tell me if the file exists or it doesn’t, to my surprise things don’t work like that. The IsExist¬†and IsNotExist¬†are two different error wrapper and guess what not¬†of IsExist¬†is not IsNotExist, good luck wrapping your head around it. I eventually ended up answering this on stackoverflow.

After a few iteration of using it on my own and fixing few bugs the utility is ready except the fact that it is missing test cases which I will soon integrate, but this has helped me learn Go a lot and I have something fun to suggest to people. Well, I am open to contribution and hope you will enjoy this utility as much as I do.

Here is a link to chuck!

Give it a try and till then Happy Hacking and Write in GO! 

Featured Image: https://gopherize.me/

Dockah! Dockah! Dockah!

Dockah! Dockah! Dockah!

I have been dabbling with docker for quite sometime, to be honest when it was introduced to me I didn’t understand it much but as time passed and I started experimenting with it I got to know the technology better and better. This made me understand various concepts better. I understood virtualization, containerization, sandboxing¬†and got to appreciate how docker solves the problem of works on my machine.

When I started using docker I use to just run few commands and I could get the server running, this I could access through browser that was more than enough for me. When I use to make changes to the code I could see it getting reflected in the way I am running the application and I was a happy man.

This was all abstract thinking and I was not worried about what was going inside the container, it was a black box for me. This went on for a while but it shouldn’t have, I have the right to know things and how they work. So I started exploring about the realm and the more I read about it the more I fell in love with it. I eventually landed up on Jessie’s blog. The amount of things she and Gautham¬†has taught me is crazy. I could never think that docker being a headless server could actually be used to captivate an application in such a way that you decide how much resources should be given to it. We at jnaapti have been working on various other possibilities but that for some other time.

So yeah there is more to just starting the application using docker and get things to work. So let’s try to understand few things with respect to docker, this is purely from my experience and how I understood things. So containers are virtual environments which share some of the resource of your host operating system. Containers are just like Airbnb guest for which the host is the Operating System. Containers are allowed to use the resources only when the user of Operating System gives them permission to use. Now the way I use them is basically in two ways, Stateful containers or Stateless containers, stateful being the one which has some data generated and stored in them while stateless is the one which doesn’t have any dependency on data.

Let me show you one of the use case that I generally use containers for; Now people may disagree and say I am exploiting it or using the power for wrong purpose but to be very frank if it solves my problem why should I care XD. Now, imagine I want to learn to write Go¬†and I don’t want to install it on my system but have an isolated environment for it. There are two ways I can pull a docker image which has Go¬†in it or get a normal image and install go in it. An image here is just like an iso¬†file which is used to help you install an Operating System¬†on your machine. Let’s see what all images I have on my machine,

I would run docker images and the output looks like this:

docker-images
docker-images

This shows that I have a znc¬†image I use it to run a znc¬†bouncer. Let’s try and pull a ubuntu¬†image and install golang¬†in it.¬† The command goes docker pull ubuntu.

docker-pull
docker-pull

Now we need to run a docker container and get a shell access to the container. For that we run command docker run -it --name="golang" ubuntu:latest /bin/bash

Let’s break it down and see what is going on here, run¬†tells the docker to start the container -it¬†option tells that this is going to be an interactive¬†session and we need to attach a tty¬†to this,¬†--name¬†is the option to give name to the docker container and ubuntu:latest¬†is the name of the image and /bin/bash¬†is the process that needs to be run.

Once you run this command you will that you will get a root prompt something like this:

docker-prompt
docker-prompt

 

Now you can run any command inside it and you will be totally isolated from your host machine. For installing golang¬†let’s follow these instruction from Digital Ocean. You should ignore the ssh¬†instruction instead run apt update¬†and apt install curl nano.¬†Follow the rest normally and you will see it working like this:

go-docker
go-docker

 

You can play around with golang¬†in the docker and when you are done you can exit. The docker stays it’s just that you are out of it. You want the shell again you can run,

docker exec -it golang /bin/bash

You will get the shell again, this is what is called stateful container since it will have all the files that you have created. You can go ahead and mount a volume to the container using -v option in the docker run statement, this will act as if you plugged in a pen-drive in the docker storage being a directory you have created on the host machine.

docker exec -it -v /home/fhackdroid/go-data:/go-data golang /bin/bash

This will mount¬†the /home/fhackdroid/go-data to ‚Äč/go-data¬†in the docker container.

These are the few ways I use docker in my daily life, if you use it in any other way and you want to share do write it to me I would be more than happy to know.

Happy Hacking Folks!