[PARQUET-2323] Use bit vector to store Prebuffered column chunk index - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: cpp-13.0.0
Component/s: parquet-cpp
Labels:
- pull-request-available

Description

In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader

In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead.

Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.

Attachments

Issue Links

links to

GitHub Pull Request #36649

Activity

People

Assignee:: Jinpeng Zhou

Reporter:: Jinpeng Zhou

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 12/Jul/23 18:06

Updated:: 4 days ago 03:33

Resolved:: 19/Jul/23 08:29

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3.5h