Follow the leads: Thinking and solving software problems

Welcome to the second stage of your Sherlock training. We already discussed how to reproduce complicated software problems in part one and by now you should be able to see the problem happening and have a few notes. It is time to tap into that powerful pink brain of yours and start thinking about potential solutions.

The mysterious case of the never-ending login

It was a cold and rainy night in October and I had just grabbed a new database dump from a development server to continue working on a Drupal site locally. After feeding MySQL with the fresh data and clearing Drupal’s cache I proceeded to login and then I noticed it: a murder had been committed!

The problem

I could browse around, get to the login form and even submit it without getting an error but for some reason I didn’t actually login.

Background information

This is a Drupal 7 site running on Apache and being cached with Redis and Varnish, all of which are correctly set up on my local server and as similar as possible to the development server.

Having spent more than enough quality time with Varnish and Drupal, I know that most problems related to the login process can be tracked down to cookies and that is the first obvious thing to check. I quickly confirmed that wasn’t the case here just by reading my local VCL file and comparing it with the development, staging, and production environments—all of which worked fine.

I could also easily confirm that this wasn’t a bug as I was using the same codebase everywhere, up to the latest commit. Still, I suspected it was something related to Varnish. I bypassed it to confirm; first, by using Apache’s IP port to reach the site, then, by disabling Varnish altogether. In both cases, the login problem was resolved. So, I was onto something.

The fix

My next step was thinking about the last time the site worked correctly on my local server and what had changed since then, specially related to Varnish. I just had to review the latest database dump I had grabbed, which I usually name with a timestamp, and ran git log to find a configuration change related to IP addresses for reverse proxying.

I got to a line like this in the site’s settings.php file:

$conf['reverse_proxy_addresses'] = array('127.0.0.1', '172.31.3.10', '172.31.39.4');

This is necessary to ignore the IP address of load balancers or Varnish servers, both of which we run on production, and correctly get the client IP of users visiting the site. The easiest way to test if that was the problem was to comment the line out. I did that and it fixed the issue. I just had to move those lines to environment dependent settings files that I already was using and case closed.

Lessons learned

And why did the problem occur only on my local server? I’m glad you asked. Well, it turns out that my browser and the server run on the same box, so both are coming from 127.0.0.1 and that line above made Drupal throw away my session information as soon as it was created.
So exciting, I know.

How would I solve this case?

I consider the most obvious approaches first — usually the quickest to implement — and research and scribble a few ideas before I focus on more complex and time-consuming alternatives.

Also, don’t forget about the silly things: debugging code still running, incorrect settings or test URLs (I’ve wasted more time that I’d like to admit fixing code on one server while testing on another), assigning instead of comparing, and so forth. Even the most experienced programmers make these mistakes. Remember: automated tests won’t uncover flaws in your logic, so carefully read your code as many times as possible.

Keep an open mind and don’t think all problems are similar, although many times they are. Avoid getting trapped with a hammer because you think everything is a nail situation.

Follow the paths

Breathe and think. I know it’s easier said than done. Do not let stress take over. If you’re reading this, you have the skills to find a solution, trust me.

Whatever you do, don’t start aimlessly writing code to see if something makes the problem go away. You have to understand the problem, find the cause and eliminate it or it will eventually bite you back. There’s no luck or deities involved here. It’s all good old software doing what you tell it to do.

Just as a detective follows the money trail to solve a millionaire’s murder, you need to follow the paths your application takes, including the libraries it depends on. If you are using open source software, you’ll have a clearer view of what’s happening than the guys using that other thing. You can insert breakpoints to debug here and there or, if you don’t have a full debugging approach in place (you should), just stop the program and print values. Find where things break and zoom in on the problematic areas.

The more organized your method, the faster you’ll be. Didn’t I mention you have to be fast to fix tricky problems? Oh yeah, you have to.

Give yourself a break

As soon as you start feeling stuck — it will happen a lot during your detective career — disengage and do something else. This could be something as intellectually amusing as browsing
Reddit and reading xkcd or just going out for a walk, playing your guitar or taking a bath (I’ve solved hundreds of problems all wet and soaped up. And yes, I know you didn’t need that level of detail but I’m trying to make a point here).

Disengaging means to stop thinking about the problem and let your brain wander. Human brains are amazing machines. They’ll keep working for you and thinking by themselves; they just ask for little bit of oxygen in exchange. Be alert and take note of any hints your brain may give you.

Allowing your brain to do its work can take a few minutes, hours, or even days but with time you’ll be able to exert some control over the process and find solutions in your mind, even realizing you’ve fixed the problem without writing a line of code. This has happened to me more than enough times while lying in bed in the middle of the night. A note for programmers’ spouses: this, and nothing more, is the reason why your loved one sometimes smiles in their sleep.

If it seems like you’re not getting anywhere, try starting from scratch with a new, potentially simpler, approach. Problems can be solved in different ways and you may have chosen the most complicated to start with. Do not try to prove (to whom?) that you can make it your way and be pragmatic; remember, working software can always be improved.

And there are still more cases to solve

Now you should be in a better position to come up with solutions for your software problems.

You actually need to enjoy the detective work to be good at it. Facing challenges and overcoming them is what makes programming such a fascinating experience for me, even if talking about it would bore most earthlings at those cocktail parties everybody keeps talking about.

See you soon to tell you more details about solving other complicated cases in part three.