[SPARK-36528] Implement lazy decoding for the vectorized Parquet reader - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector and then operate on the decoded data. However, it may be more efficient to directly operate on encoded data, for instance, performing filter or aggregation on RLE-encoded data, or performing comparison over dictionary-encoded string data. This can also potentially work with encodings in Parquet v2 format, such as DELTA_BYTE_ARRAY.

Attachments

Issue Links

is related to

SPARK-35743 Improve Parquet vectorized reader

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Aug/21 18:20

Updated:: 06/Jan/23 22:27