Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
MacOS, Chrome, Safari
Description
While profiling the performance of decoding TPC-H Customer and Part in-browser, datasets where there are a lot of UTF8s, it turned out that much of the time was being spent in getVariableWidthBytes rather than in TextDecoder itself. Ideally all the time should be spent in TextDecoder.
On Chrome getVariableWidthBytes took up to ~15% of the e2e decoding latency, and on Safari it was close to ~40% (Safari's TextDecoder is much faster than Chrome's, so this took up relatively more time).
This is likely because the code in this PR is more amenable to V8/JSC's JIT, since x and y now are guaranteed to be SMIs ("small integers") instead of Object, allowing the JIT to emit efficient machine instructions that only deal in 32-bit integers. Once V8 discovers that a x and y can potentially be null (upon iterating past the bounds), it "poisons" the codepath forever, since it has to deal with the null case.
See this V8 post for a more in-depth explanation (in particular see the examples underneath "Performance tips"):
https://v8.dev/blog/elements-kinds
Doing the bounds check explicitly instead of implicitly basically eliminates this function from showing up in the profiling. Empirically, on my machine decoding TPC-H Part dropped from 1.9s to 1.7s on Chrome, and Customer dropped from 1.4s to 1.2s.
https://github.com/apache/arrow/pull/12793
Attachments
Issue Links
- links to