Monday, November 1, 2010

Sometimes you need to make a mess

Edit: This is a follow up on another blog post by my colleague Andy Maleh that you can read here. I wanted to expand on his interpretation and add my own thoughts.

What do we mean when we say some code is a mess? We're usually referring to its structure or lake thereof. Interactions between components are awkward and poorly named. There's often logic leaking between the layers causing odd couplings and generally making our lives a pain. Software like that is certainly a mess. That sort of mess is usually traceable to poor discipline on the part of the programmers or poor management of the team. We know these messes well.

However, that's not the kind of mess I want to talk about. I want to talk about the kind of mess that happens while you're learning about a problem space by programming in it. These are flaws that you intend to clean up once you've wrapped your head fully around the problem, but you're quite incapable of doing so until you *do* understand what's going on.

Would you say code in that state is messy? I would, but I think these messes are actually natural and good. Trying to avoid all kinds of messes early can be a mistake.

It is premature optimization of the design to force structure into the system when your understanding of the problem is too weak to support it.

Here's an example. I wanted to build a pure-ruby png spiriting library to use in our web applications. It would take multiple png images, smash them together into one "sprite" png, and generate the css you need to access each of the images contained within. Now, I had never done any of this before. I didn't know how to parse a png or splice them together. I had a lot to learn to solve this problem.

What did I do? First, I went and found the official png documentation, then I found a png library in ruby I could refer to. Finally, I made a huge gigantic mess trying to get it to work. This mess was a little bit like laying all the puzzle pieces out on the floor before you start to assemble it. I had some functional tests, I had some code that could parse a png - but it was far from an ideal form. Once I had something basic working, even though the code was a disaster, I could start to refactor. More importantly, I had allowed myself to explore the problem space by not getting hung up on the optimal design. I could worry about that later. Once I had enough understanding of the solution that I felt good about introducing new concepts I would do so and refactor the code into its new home.

The end result was a nice little spiriting library you can check out here. How does this technique work? How can you make a big mess without tossing all the code in the end? I've certainly done that before. Well, most importantly, you need to write the right kinds of tests. Whenever I'm working in a problem space that I don't feel comfortable with I always start my testing at the highest possible level, most of the time this means integration tests. I do this because integration tests give me more freedom to refactor.

A system that is easy to refactor doesn't punish me as much for making mistakes and I certainly do make a lot of them. A good suite of integration tests shouldn't break unless you've truly broken the application, that is very useful. In the case of the spiriting library, I started by writing a test that expressed how I wanted to open a png and inspect its data. Once I had that working I would refactor what I thought I understood well enough and move on to the next test. When I got stuck, I tried to think up an easier test that doesn't require me to take on so much of the problem. So foremost, when I have a really awesome set of integration tests I'm never afraid to attack a mess. I know the test suite has got my back.

Second, I need to be really good at refactoring. As I understand more of the problem I need to be able to take all these mistakes and turn them into solid code. Another important aspect of refactoring is knowing when the mess is getting too big. I need to understand how to refactor as much as I need to understand when a refactoring is necessary. I try to do this by always keeping the current state of the solution in my brain in sync with the solution in the code. As I learn and as code starts making sense I change it to match that new knowledge. That way when I come back to a problem I can pretty clearly tell where I need to focus my learning - wherever the code is the messiest.

Therefore I endeavor to only allow a mess to live as long as my ignorance of the problem space.