CCD Red Degree Rule - Beware of Optimizations!

Have you ever written some code that you suspected was inefficient? How about code that was ridiculously efficient but difficult to understand? Knuth (or possibly Hoare or Djikstra) once stated that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil". Why would he say that?

One of my favourite optimizations (from my C days) is the two variable swap operation. This little beauty takes two variables and swaps them without creating a temporary variable. In psuedocode it looks like this:

  a = a XOR b
  b = a XOR b
  a = a XOR b

That's it, it's beautiful. If you don't believe me that it works go and try it out on paper with a and b equal to 0x10 and 0x01 respectively. You'll be surprised.

Not only is it memory efficient (as it doesn't create a temporary variable) but if a and b happen to be registers then it's lightning fast. So what is wrong with it?

Well for one thing it's difficult to understand. For the first couple of months, every time I came across this code I needed to reassure myself that it was actually swapping the two values. This also means that every time I tried it with a different data-type (or rewrote it in a different application) I had this uneasy feeling that it may not be functioning correctly.

Secondly, processors in the x86 family have a single instruction to swap two values anyway (XCHG) that is faster than three XOR operations. Funnily enough many optimizing compilers can recognize a temporary variable swap and replace it with a single XCHG instruction but they cannot figure out what the 3-XOR operation version does so they leave it alone. Here I have been defeated by my own cleverness as the compiler can figure out the obvious approach but not my fancy approach.

Thirdly, the above code on a modern processor is a few microseconds faster and uses less memory by a statistically insignificant amount. The time taken to carefully hand-craft that method is a complete waste of my time and of the time of everyone who ever needs to read it.

Finally if a and b point to the same area of memory (which is entirely possible in C) then the memory will be zeroed out instead of swapped. That can be a mighty gotcha the first time you run across it.

Coincidentally Ruby has the best swap operation I have ever seen:

a, b = b, a

It may be inefficient under the covers but I really don't care. It gets the point across in a manner that is unlike any other language I have come across.

All optimization is a form of trade-off. We sacrifice readability or maintainability for speed or memory. It is important to realize that we can go the other way as well. Processor and memory usage are cheap in comparison to developer time, especially to the guy who signs the bill. It is usually worth the performance hit to make your code a little easier to understand.

Does this mean that there is no place for traditional optimization in the software development process? No of course not. After all forgetting the small inefficiencies 97% of the time leaves 3% of the time we should worry. What constitutes that 3%?

If you suspect that your code might be running slow you need to ask a few questions. If you can answer yes to each of the following then it may be time to attempt an optimization:

  1. Do we know where the problem is? There is no point trying to optimize a 5 record sort which is executed once in a web request that calls the database 5 times.
  2. Can we improve it? Processing 10,000,000 records just may take a lot of time. If there's nothing you can do about it then forget it.
  3. Is the current speed acceptable to the client? If the client isn't going to notice then don't bother.
  4. Can we get the speed to within client expectations? If the client has unrealistic expectations then no amount of optimization will help you.
  5. Do we know how fast it actually is? If you don't know how fast your code runs then you won't know whether any changes you make improve performance or potentially hinder it.

That last point is probably the most important. Before you start to optimize your code you need to figure out how fast it actually is. After every change you make you need to ensure that a) It still works and b) It is faster than before. If either of those tests fail then rollback the change and try again. Don't keep an optimization that doesn't significantly improve the code. It usually is not worth the loss in readability.

It's also important to note that when you test the speed of your algorithm you use data that is representative of the actual data that will be provided in production. Ideally you should use a real production set somewhere. The reason is that some algorithms are very data sensitive. Some sorting algorithms have worst-case performance for data which sorted in reverse. If you know that your data is always in reverse sort order then a quick swap algorithm will out-perform a quicksort.

That is actually quite an interesting point. You will frequently get better results by replacing the entire algorithm for a better one (in the context of your program) than you will by trying to hand-craft the one you have.

Finally, you should not use this as an excuse to write code that performs badly. If two solutions are just as easy to understand (subjective I know) but one of them is faster then the faster one is always preferred.

I said above that all optimization is a form of trade-off. The best way to make smart decisions about optimization is to consider that trade-off and decide if it is really worth the time and effort.

Posted by: Mike Minutillo
Last revised: 27 May, 2011 02:42 PM History


No comments yet. Be the first!

No new comments are allowed on this post.