SSE is worth the effort
November 8, 2006Simplex noise is a quite nice type of deterministic band-limited noise. Think of it as Perlin noise, just better. But, although it is relatively fast, generating huge amounts of noise might take a while. Since the algorithm doesn’t leave that much room for improvements, I chose to go the low-level path and use intels SSE to accelerate it. There’s no need to write x86 assembly for this, since intel defined a set of intrinsics for C and C++. These functions correspond almost directly to the instruction set of the CPU, but you don’t have to deal with register allocation. It’s more like the built-in “+” operator for integers: It maps to a single instruction, but the compiler is responsible for moving the operands to the right place. Once you get used to write “_mm_add_ps(a,b)” instead of “a+b”, you can work with 4-tuples at almost the same cost as single values. The code is a bit hard to read afterwards, but it’s worth the effort: Calculating a single 3D simplex noise value is three times faster when using SSE on a Core 2 Duo. This is still just a constant factor, but if this is your bottleneck, you get three times the performance for about 30 lines of ugly code. I think this is a fair trade.