How I Read Unfamiliar Codebases
Last month I was handed ownership of a service I'd never seen before. The previous maintainer had left, the documentation was a README that mostly described what the service was supposed to do rather than what it actually did, and I had two weeks before I needed to ship a change to it.
This is a fairly normal situation. Here's what I do.
Start with the tests
Before I read a single line of application code, I read the tests. Not to understand coverage — to understand intent. Tests describe what the author thought the code was supposed to do, in smaller, more readable units than the implementation.
In Go, this means find . -name '*_test.go' | head -20 and then reading through the test names before I open the files. Test names are often the best documentation in a codebase. TestUserCreate_DuplicateEmail_ReturnsConflict tells me more than three paragraphs of inline comments would.
If the tests are bad or missing, that's also information. It tells me the codebase is probably harder to change than it should be, and I should be more conservative about what I modify.
Find the entry point
For a service: where does a request come in? For a library: what's the public API surface? For a CLI: what does main() call?
In Go, main.go or cmd/*/main.go is usually the answer. In Python, it's less predictable — I look for if __name__ == '__main__' or an entry point defined in pyproject.toml. The point is to find the spine of the thing and read outward from there, following function calls, not file names.
File names are organized by whoever wrote the code, which means they're organized around how the author thought about the problem. Entry points are organized by how the system actually works.
Grep before you browse
When I need to understand where something happens — where a database connection is opened, where an HTTP handler is registered, where a specific error is produced — I grep for it rather than browsing the directory tree.
rg "db.Open\|sql.Open\|pgxpool.New" --type go
This gives me the actual locations in seconds instead of the fifteen minutes I'd spend opening files and scanning them. I use ripgrep for this. The -A and -B flags (lines after/before the match) are useful when I need context.
The other grep I run early: search for the name of any external service the codebase talks to. If it calls a payments API, I search for the payment provider's name and see every place it's mentioned. That usually shows me the integration points without reading through all the business logic around them.
Run it before you read it
If I can get the service running locally, I do that before I understand it fully. Actually seeing it respond to requests — even just hitting the health check endpoint — changes how I read the code. I have a mental model of what's alive and what's scaffolding.
The setup process is also information. Services that take thirty steps to run locally are services that nobody runs locally, which means untested assumptions everywhere. Services that start with go run ./cmd/server have been run by the people who wrote them.
Take notes, even bad ones
I keep a scratch file open while I'm reading. It's not documentation — it's a working memory dump. Questions I have, things I noticed, guesses about why something was done a certain way. Most of it never goes anywhere, but writing it forces me to be specific about what I don't understand instead of vaguely confused.
After a few hours of reading, I can usually write a one-paragraph description of what the service does and how it does it. If I can't, I keep reading. The inability to write it clearly means I don't understand it clearly yet.
← Neovim year · All posts