Description
We can write SIMD functions using Intel's intrinsics (in <immintrin.h>) now, but we have to use
__attribute__((__target__("avx2")));
at the end of the function declarations, and these functions can't be inlined, which can hurt performance quite a bit.
We should populate sse-util.h with the intrinsics we are using. A list can be found via
grep -rIohE '_mm[^( ]*' be/src/ --include '*.h' --include '*.cc' | sort | uniq -c | sort -n