Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2323

Use bit vector to store Prebuffered column chunk index

    XMLWordPrintableJSON

Details

    Description

      In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader

      In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead. 

      Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.

      Attachments

        Issue Links

          Activity

            People

              zjpzlz Jinpeng Zhou
              zjpzlz Jinpeng Zhou
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h