Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15784

[C++][Python] Parallel parquet file reading disabled with single file reads

    XMLWordPrintableJSON

Details

    Description

      There is a flag enable_parallel_column_conversion which was passed down from python to C++ when reading parquet datasets which controlled whether we would read columns in parallel. This was allowed for single files but not for reading multiple files. This was an old check to help prevent nested deadlock.

      Nested deadlock is no longer an issue and the flag was mostly inert once we removed the synchronous scanner.

      Unfortunately, when we removed the synchronous scanner we forgot to remove this flag and the result was that a single-file read ended up disabling parallelism.

      Attachments

        Issue Links

          Activity

            People

              westonpace Weston Pace
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m