Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.0
-
None
-
None
Description
Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector and then operate on the decoded data. However, it may be more efficient to directly operate on encoded data, for instance, performing filter or aggregation on RLE-encoded data, or performing comparison over dictionary-encoded string data. This can also potentially work with encodings in Parquet v2 format, such as DELTA_BYTE_ARRAY.
Attachments
Issue Links
- is related to
-
SPARK-35743 Improve Parquet vectorized reader
- Resolved