Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
We are running benchmark on the arrow avx512 build, perf show unpack1_32 as the major hotspot for BM_PlainDecodingBoolean indicator.
Implement this func with Intrinsics code show big improvements. See below the results on CLX 8280 cpu which is capable of AVX512.
Indictor | default sse build | avx512 build | avx512 build + Intrinsics | Intrinsics improvements |
BM_PlainDecodingBoolean/1024(G/s) | 1.55394 | 3.77701 | 5.02805 | 1.331224964 |
BM_PlainDecodingBoolean/4096(G/s) | 1.83472 | 5.3826 | 8.3443 | 1.550235945 |
BM_PlainDecodingBoolean/32768(G/s) | 2.00957 | 6.1258 | 10.3793 | 1.694358288 |
BM_PlainDecodingBoolean/65536(G/s) | 2.02249 | 6.20035 | 10.5778 | 1.706000468 |
Attachments
Attachments
Issue Links
- links to