Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader
In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead.
Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.
Attachments
Issue Links
- links to