int a = 8; // just assigning a random value to clear to 0
a = 0; // setting a variable to 0 normally
Versus the 'efficient' way:
int a = 8;
a ^= a; // faster on register CPU, slower on stack CPU
If you translate that to x86 assembly code, it becomes:
MOV AX, 8
MOV AX, 0
Versus:
MOV AX, 8
XOR AX, AX ; same number of instructions, but faster
The arithmetic operation of XOR vs MOV is just faster by 2-3 bus cycles. I might be wrong on the number but the key point is that the savings are really trivial.
Here's a snippet of code generated for a stack based architecture, for example Java:
BIPUSH 8;
ISTORE_0;
ICONST_0;
ISTORE_0; // 4 instructions
Versus:
BIPUSH 8;
ISTORE_0;
ILOAD_0;
ILOAD_0;
IXOR;
ISTORE_0; // 6 instructions!
And you've just added an unnecessary overhead increase of 50%. That's sometimes why people do mumble about stack based machine architectures like Java and C#, which probably have some kernels of truth. But with trivial savings like that, it's probably relevant for old machine architectures, but for modern day RISC machine architectures, the stack slots are mapped onto registers directly, and with compiler optimizations, the differences are probably immaterial.
As a comparison, if the programmer fires up his profiler and optimize on even just the trivial-est of loops, he'll probably end up having a faster application just by doing that, than having to manually tweak every single instance of the given example, with negligible gain (or even a loss of) performance. So if you're ever looking for a good example of an unnecessary and counterproductive 'premature optimization', this is it!
0 comments:
Post a Comment