Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
Parquet V2 has a BYTE_STREAM_SPLIT encoding which is not currently directly usable by Spark (as there is no way to enable writing of this encoding). However, other engines may write a file with this encoding and the vectorized reader should be able to consume it.
A vectorized version of this encoding should be implemented
Attachments
Issue Links
- is part of
-
SPARK-36879 Support Parquet v2 data page encodings for the vectorized path
- Resolved