Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Implemented
-
1.0.1
Description
String decoding from arrow tables is a major bottleneck in using arrow in Javascript–it can take a second to decode a million rows. For utf-8 types, I'm not sure what could be done; but some memoization would help utf-8 dictionary types.
Currently, the javascript implementation decodes a utf-8 string every time you request an item from a dictionary with utf-8 data. If arrow cached the decoded strings to a native js Map, routine operations like looping over all the entries in a text column might be on the order of 10x faster. Here's an observable notebook benchmarking that and a couple other strategies.
I would file a pull request, but 1) I would have to learn some typescript to do so, and 2) this idea may be undesirable because it creates new objects that will increase the memory footprint of a table, rather than just using the typed arrays.
Some discussion of how the real-world issues here affect the arquero project is here.
Attachments
Issue Links
- links to