Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-835

[C++] Add option to parquet::arrow to read columns in parallel using a thread pool

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • cpp-1.0.0
    • parquet-cpp
    • None

    Description

      For Parquet file reading that is not IO-bound, we can go faster by reading columns in multiple threads (assuming underlying the IO source is threadsafe). The code will be very similar to that in https://github.com/apache/arrow/blob/master/python/src/pyarrow/adapters/pandas.cc#L1193

      Attachments

        Activity

          People

            wesm Wes McKinney
            wesm Wes McKinney
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: