[SPARK-37975] Implement vectorized BYTE_STREAM_SPLIT encoding for Parquet V2 support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Parquet V2 has a BYTE_STREAM_SPLIT encoding which is not currently directly usable by Spark (as there is no way to enable writing of this encoding). However, other engines may write a file with this encoding and the vectorized reader should be able to consume it.
A vectorized version of this encoding should be implemented

Attachments

Issue Links

is part of

SPARK-36879 Support Parquet v2 data page encodings for the vectorized path

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Parth Chandra

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Jan/22 19:34

Updated:: 20/Jan/22 19:38