Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12629

[C++] Configurable read-ahead in CSV and JSON readers

    XMLWordPrintableJSON

Details

    Description

      We are compiling Arrow C++ to WebAssembly and ran into the following issue with the CSV reader:

      Browsers became very picky about the use of SharedArrayBuffers after the events around Spectre and Meltdown.

      As a result, you have to compile Arrow to WebAssembly without threads if you don't want to run your website with very strict cross-origin isolation.

      Unfortunately, the CSV reader seems to always spawn a thread for the read-ahead in both, the SerialStreamingReader and the SerialTableReader independent of whether use_threads is set.

      Right now, this effectively means that you cannot use the CSV (and JSON) readers in threadless WebAssembly builds.

       

      https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839

      https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ankoh Andre Kohn
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h