Sunday, April 6, 2014

Fight against TEOOD – too-early-optimization-oriented-development

When should you optimize your program?

This is is something you might have seen many times around you so far: some developers love to optimize things. They want their code to run faster. They're speaking ops/s (« operations per second ») and aim to get the highest performances. That's probably a worthy goal, but let me explain why I'm thinking it's sometimes especially penalizing.

Why does it matter?

I guess that (great) people want to deliver great projects. They don't want to to become blameful once their program has reached the first trivial bottleneck and everything got blocked at a ridiculous performance rate.
They keep in mind that considering performances transforms you into a super-developer.

super-developer, super hero of all the developers. (source)

What is TEOOD?

To counter this trend, I'd like to describe my personal view of optimization-oriented-development. This is obviously a mocking name, and I only want you to consider the following points. I guess I also don't need to add that these arguments are probably not working for 100% performances driven programs (for real), even if some points could be considered.

Define your goal(s): performance != optimization

Of course you're probably coding with tons of XXIInd century ideas, but have you ever thought about what is really important in your super-program? Some ideas:
  • Reliable: not failing on the first encountered corner-case. Perhaps some programs could be allowed to constantly fail, but I'd call them "proof of concept" rather than "program".
  • Open-sourceable: this means a lot to me. Ensuring your program is properly designed, well documented and strongly tested, allowing other super-developers to add features or fix your mistakes.
  • Simple: This can be considered, especially when solving simple problems. Do you really need  1.000 lines of C++ code to find the maximum value among 1000 integers? Of course your program is now able to compute min/avg/max saving precious CPU cycles, but does is really matter?
Does "Hello World" really need your state-of-the-art GPU optimization? (source)

I guess you'll easily find something in this list more important that raw performance. And that's the point: performance IS NOT optimization. Don't stay focused too much on performances. Stay far from optimizations if you haven't reached a critical bottleneck. Without talking about this famous 80/20 rule, I  would say that if you get ridiculous performances, this is probably not an optimization problem but a performance one. You've missed something trivial, and this thing is certainly related to global design or a crappy complex method, not because you used a deprecated API or you're browsing your ArrayList twice instead of once. And if I'm wrong, guess what? Because you've properly designed your program, you'll easily find the awful costly function call(s) thanks to a simple profiling tool. And I'm pretty sure the solution will remain quite simple (caching, buffering, multi-threading, etc.).

And now this is time to tell you what I have in mind: you should take your time to think about the overall design. About how your program will live or die. About what you're trying to do: implementing an algorithm? adding abstraction? automating something? You've probably noticed that successful projects have been refactored due to design or API issues, barely because of ugly performances issues.

In TEOOD, I would advise developers at most to consider performance troubles, but never to think about optimized calls while they're unable to provide a stable and strong design, unable to point out such issues within their unit test or overall benchmarks.

That's cool, so I can write O(2n) algorithms?

That's not the point. Actually, even in this extreme scenario, I think this might be totally acceptable if you're aware about limitations, you don't need to scale right now, this has been documented, and is replacing 7 days of hard-work and thousands lines of code by a simple single instruction that just required 5 minutes of your precious time.

If you're writing a program to solve issues, not performances issues, you're probably concerned by this article. Plan your scaling options, both horizontal and vertical (but keep in mind that scaling too early is also a very bad practice). Plan your tolerance margin if you think you're doing un-optimized things. Think about the complexity you're really fighting against, and define the bottlenecks you're ready to reach.

Don't think too fast, speed is not your main concern!

For instance, if you consider writing an ETL, reading various files formats and writing to various databases, you should NOT be worried about performing many serializing/deserializing operations in your workflow. At least when designing the first parts. Even worse: don't waste your time comparing different serializing methods for every possible cases. Even if you're converting String to byte[] and byte[] back to String 42 times in your Java program, you'll have all the cards in hands to test (benchmark) the limit and fix this sooner or later. Your application is probably not even CPU-bound but IO-bound, and thus wasting many CPU-cycle at the first draft is not a disaster. At this point, just think about overall design. Make the pipeline smooth, explicit and easy to maintain for you and each other. This way you'll be able to optimize the right thing at the right time.

I could have called this article "swimming slowly vs sinking fast": taking care of optimizations at the early beginning will irreparably impose to dive deeper and deeper into source code, continuously turning knobs left and right (note that this stands for benchmark-addicts as well).

Once you meet strong performance issues and you don't want to scale, if you've decided to keep the whole program clean, simple, and well-designed, this probably won't take lots of time to fix it. On the contrary, diving into a complex-because-faster program could have lead to many unsafe tricks, perhaps introducing new unknown bugs due to side-effects.


TL;DR. If I had to describe TEOOD in a few points:
  • Stay focused on what you're doing: solving algorithm issue? automating something?
  • Take care of your design, performances will follow.
  • Think (at most) about performances, not about optimizations.
  • Know the limits, document them.
  • Define which bottlenecks are important and how to surpass them.