vector_size attribute. For example, the following C function:will produce this output:
_addv4:
addps %xmm1, %xmm0
ret
Impressive! The arguments are passed in XMM registers, so there is even no read/write from memory!
Should I worry about my job? Will compilers finally outsmart humans in producing better assembly output? Not yet... If we take another, very similar example:
- we just want vectors of 3 floats, instead of 4, which is in a way natural, living in a 3D space; we get this error message from GCC:
error: number of components of the vector not a power of two
Oops! Good that I don't get this message from malloc() when I want to reserve lets say 95 bytes. Next, look at the following simple function:
Compiling with
gcc -S -O3 -msse3 -fomit-frame-pointer -foptimize-register-move givesWhat is disturbing for me is that register moves are not optimized. Basically all the
moveaps instructions could be optimized away, if we pack the floats in the right registers, something like this:It seems that for performance critical applications or code segments we'd better do it manually, in assembly or using intrinsics.

0 comments:
Post a Comment