Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10220

[JS] Cache javascript utf-8 dictionary keys?

    XMLWordPrintableJSON

Details

    Description

      String decoding from arrow tables is a major bottleneck in using arrow in Javascript–it can take a second to decode a million rows. For utf-8 types, I'm not sure what could be done; but some memoization would help utf-8 dictionary types.

      Currently, the javascript implementation decodes a utf-8 string every time you request an item from a dictionary with utf-8 data. If arrow cached the decoded strings to a native js Map, routine operations like looping over all the entries in a text column might be on the order of 10x faster. Here's an observable notebook benchmarking that and a couple other strategies.

      I would file a pull request, but 1) I would have to learn some typescript to do so, and 2) this idea may be undesirable because it creates new objects that will increase the memory footprint of a table, rather than just using the typed arrays.

      Some discussion of how the real-world issues here affect the arquero project is here.

       

      Attachments

        Issue Links

          Activity

            People

              domoritz Dominik Moritz
              bmschmidt Ben Schmidt
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m