Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37975

Implement vectorized BYTE_STREAM_SPLIT encoding for Parquet V2 support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • SQL
    • None

    Description

      Parquet V2 has a BYTE_STREAM_SPLIT encoding which is not currently directly usable by Spark (as there is no way to enable writing of this encoding). However, other engines may write a file with this encoding and the vectorized reader should be able to consume it.
      A vectorized version of this encoding should be implemented

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              parthc Parth Chandra
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: