@arithma: Talking about optimization or performance issues in general generally implies to be on asymptotic behaviors. Especially with such powerful computers widely available in the market.
This has been a trend here lately. Developers shouldn't discuss development from their own narrow perspective unless they declare it as such. For some people an hour is a blink and 10 ms are a life time. In both of these, there are cases when asymptotic analysis doesn't give you the insight into performance that you need.
When you usually open a topic about performance, or anyone does, it usually means that it matters with performance critical applications (like signal processing system that has to run in realtime with strict deterministic behavior with respect to time).
Another example: you're writing the floating point division algorithm in VHDL for your microprocessor.
20ms may be a lifetime for some applications (like GPUs). In real time applications (medical imaging, gaming, simulations, ...) the whole time to execute all your code is at best 33ms (to get 30 FPS which is low) or 16ms for 60 FPS which is now standard.
What I was trying to say earlier is recursion does not necessarily imply a performance loss and usually your iterative approach will be emulating recursion: you're getting it either wrong, creating more machine code, or creating something additional to maintain. In those not too unusual cases, there's no trade off, recursion is the better option on all accounts.
However shoehorning a for loop to become a recursive algorithm may mean that obvious transformations to parallelizing code will be become harder to spot (either with your eye or with a profiler). This becomes an issue when your doing iterative programming first for correctness then for profiling and solving bottlenecks.
All of this means nothing when it comes to web development for example. No matter how fast your code is, you'll have to wait for the database to reply to you and populate data. PHP takes advantage of that for example and can get away with being many magnitudes slower than it possibly be optimized.
[edit] Just checked the link, and I have to say it's an ill comparison. I'll have more to say after I get back home.