A Live Developer Journal

Finding Hotspots In Your Codebase

It can be difficult to identify and prioritize problem areas to improve in large code bases, because no one can hold a hundred thousand lines of code in their mind, especially when you have multiple teams working on different areas, and legacy code that no one understands or feel responsible for.

Adam Thornhill in his book “Your Code as a Crime Scene” combines his knowledge of forensic psychology to create a number of techniques to help us find, prioritize and address these problem areas.

Finding Hotspots in your code

In forensic psychology, geographical profiling is where the locations of related crimes are mapped in order to predict the area an offender is likely to reside - the Hotspot. Hotspots allow the police to more effectively allocate their resources. They can investigate a smaller area instead of a whole city.

In a programming context, a hotspot is a a complex area in our codebase that changes a lot (takes a lot of effort to maintain). Both complexity and change matter because alone they don’t tell us the full picture. We might confuse a config file that changes a lot or a stable and well-tested legacy file with hundreds of lines of code as hotspots. Together, number of lines of code (complexity), and the number of times a file has been modified (effort) is a good indicator of problem areas in our code-base (Gall and Krajewsky, 2003).

Hotspot Shell Script

There are a few tools mentioned in Thornhill’s book to help us narrow down hotspots, including CodeCity and Code Maat etc. I decided to write this one-liner shell script to do this instead (with the help of a colleauge from work). Mostly because I struggled to get the software working on my Mac because of operating system incompatibility.

echo "modified count,line count,filename" > hotspots.csv && join -j 2 -o 2.1,1.1,1.2 <(wc -l **/*.{js,ts}) <(git log --diff-filter=M --name-only | sort | uniq -c| grep src) | sort -r -n -k 1 -k 2| sed 's/ /,/g' >> hotspots.csv

The result of running the above shell script was as follows:

Discovering hotspots in your codebase

The really cool thing about the table above is that the first file is the exact file the CTO of the startup I work at mentioned was the most nightmarish file in the entire codebase. Super cool!