top of page

Profiling in Go

Writer's picture: Sam ShettySam Shetty


Writing code optimizations based on assumptions is like putting a plaster on a swollen hand without knowing if the bone is broken. Just like an X-ray machine a profiler helps to identify the problematic areas and performance bottlenecks in our code. This post will walk you through the whats, whens, whys and hows of profiling in Go. As a result, we will also see the improvements in performance stats with the optimized code. WHAT is profiling?


Profiling is an optimization technique to reorder the code in your applications based on the statistical data gathered while running your applications under load. We can profile Go programs using the net/http/pprof package and visualize the profiling data thus far generated using the pprof tool.

WHEN to profile your code? We run profiling more often when we get slow response time, high CPU, high memory utilization reports from the monitoring data of our production servers.

WHY to profile your code?

Profiling helps target the weaker sections of your code which maybe functionally correct but may give way when you run the code at scale. Profiling just like debugging also sets you on the path to better your code and learn the language better. HOW to profile your code ? 10 steps to profiling your go code STEP 1. Setup workspace > mkdir -p github.com/sshetty10 && cd ~/go/src/github.com/sshetty10 > git clone https://github.com/sshetty10/go-seed-db

STEP 2. Import the pprof package in main.go. This will, in the background, install handlers under the /debug/pprof/ URL if you are using the DefaultServeMux.

import _ "net/http/pprof"

If not, then you would need to add the necessary handlers to your router as in here:


pprofRouter := chi.NewRouter()
pprofRouter.HandleFunc("/debug/pprof/", pprof.Index)
pprofRouter.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
pprofRouter.HandleFunc("/debug/pprof/profile", pprof.Profile)
pprofRouter.HandleFunc("/debug/pprof/symbol", pprof.Symbol)

// Manually add support for paths linked to by index page at /debug/pprof/
pprofRouter.Handle("/debug/pprof/goroutine", pprof.Handler("goroutine"))
pprofRouter.Handle("/debug/pprof/heap", pprof.Handler("heap"))
pprofRouter.Handle("/debug/pprof/threadcreate", pprof.Handler("threadcreate"))
pprofRouter.Handle("/debug/pprof/block", pprof.Handler("block"))

Run > go build > ./go-seed-db

STEP 3. Create your postman collection for ListTrainers. Click send to test the API.


STEP 4. Run a load test using postman collection runner or "Hey".

  1. With Postman collection runner: You can update the number of times you want to run the request as seen here.


2. With Hey: you need to install Hey (To learn more about Hey check my post here.)

brew install hey OR go get -u github.com/rakyll/hey ​

> hey -m POST -d '{"query":"query{\n\ttrainers{\n    name\n    city\n    licenseID\n    licenseState\n    }\n}","variables":{}}' -T application/json -c 10 -z 5s http://localhost:8080/v2/gql

You can also use the Postman Collection Runner to run the load test 1000 times (-n 1000). My personal preference is Hey due to the concurrent model used for triggering these requests.

Please ensure that you run the profiler simultaneously in a different tab while you run your load test

> go tool pprof --seconds 5 http://localhost:8080/debug/pprof/profile
	

We sent 1000 ListTrainer requests. Go tool pprof will create a profile file with a .gz extension with a message "Saved profile in /Users/foo/pprof/pprof.samples.cpu.011.pb.gz". We will use this file for our pprof visualizations

STEP 5. Analyzing the generated pprof stats:

1. Go to http://localhost:8080/debug/pprof/. The following stats are gathered by the Go profiler: goroutine - stack traces of all current goroutines heap - a sampling of memory allocations of live objects allocs - a sampling of all past memory allocations threadcreate - stack traces that led to the creation of new OS threads block - stack traces that led to blocking on synchronization primitives mutex - stack traces of holders of contended mutexes In my experience, I have been able to identify memory leaks using the heap profile and also optimize time hogging processes. For instance, NumGC here signifies how many GC cycles took place. In complex application code there will be multiple. PauseNs tells you the times each GC took.

2. The stats above to a visual person like me are not very easy to read. I prefer using top and flamegraphs in the pprof tool UI. You can view these stats in other representations by running:

> go tool pprof -http=":8000" /Users/sshetty/workspace/go/src/github.com/sshetty10/go-seed-db/go-seed-db /Users/sshetty/pprof/pprof.samples.cpu.011.pb.gz		

where pprof.samples.cpu.011.pb.gz is the name of the pprof file generated in the Step 4. This will open a browser window as in here.


3. The top command shows you that the LicenseState function is taking some time which can be optimized.

STEP 6. Benchmarking : Now that we have identified the target function we will write a benchmark test for it to find the exact line causing this issue. You can learn more about benchmark testing in my post here.



func BenchmarkLicenseState(b *testing.B) {
    ctx := context.Background()
    tr := &model.Trainer{
        ID:        id,
        Name:      "somenewtest",
        City:      "somecity",
        Age:       70,
        LicenseID: "VA-38274",
    }
    for i := 0; i < b.N; i++ {
        resolver.Trainer().LicenseState(ctx, tr)
    }
}

Run the benchmark test. A prof.cpu file will get generated in your project directory.

	> go test -run=XXX -bench=. -benchmem -cpuprofile prof.cpu	

se go tool pprof to see the benchmark stats in prof.cpu.


> go tool pprof go-seed-db prof.cpu

To see the lines of code which can be optimized run the top10 -cum command. This gives the cumulative list top 10 of lines of code which are taking up the maximum share of time spent in this function call.

 (pprof) top10 -cum

STEP 7. Optimize your code We see that strings.Split is a function line we can optimize. LicenseState splits the license ID of the format VA-74653 on the "-" separator to get the state code which are the 2 characters before the hyphen. We do not care about the rest of the hyphens in this case and just need all characters before the first hyphen. Split function will return a slice of strings which we do not use and so can get rid of. Old Code with strings.Split:

func (r *trainerResolver) LicenseState(ctx context.Context, trainer *model.Trainer) (string, error) {
    licenseState := strings.Split(trainer.LicenseID, "-")[0]
   	return licenseState, nil
}

Optimized Code

func (r *trainerResolver) LicenseState(ctx context.Context, trainer *model.Trainer) (string, error) {
	idx := 0
	for idx < len(trainer.LicenseID) {
		if trainer.LicenseID[idx] == '-' {
			break
		}
		idx++
	}
	licenseState := trainer.LicenseID[:idx]

	return licenseState, nil
}

STEP 8. Run Benchmarks again:



Stats before optimization : 9065338 operations

168.1 ns/op - CPU

40 B/op - the average number of bytes allocated per operation

2 allocs/op - the number of allocations per operation Stats after optimization:

35262470 operations

30.37 ns/op - CPU

5 allocs/op - the average number of bytes allocated per operation

1 allocs/op - the number of allocations per operation That was a tremendous improvement with just a few lines of code. The LicenseState does not show up in the top10 -cum list either.

STEP 9. Run the profiler again to generate a new pprof.gz file and check the stats.

We see that the LicenseState function has gone from the top functions list. Our optimizations seem to have worked. STEP 10. Our final test is to check if our response times have improved with our load tester (Postman or Hey). Hey Before Optimization:


Responses per second : 7095 Requests/sec : 1614 Hey After Optimization:

Responses per second : 8148 Requests/sec : 1614

We see a significant improvement in the number of requests in 5 seconds and also the average number of requests per second.

Conclusion

Profiling your code can create magical results. This can be very addictive . I would like to end this post by saying that we should also know when to stop optimizing our code. Looking at merge requests day in and day out, I personally believe that there needs to be a balance between code optimizations and the readability of code. Too much of anything is detrimental not only in code but in life too. Hope you have fun profiling !!!






0 comments

Recent Posts

See All

Comments


bottom of page