Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37975

Implement vectorized BYTE_STREAM_SPLIT encoding for Parquet V2 support

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • SQL
    • None

    Description

      Parquet V2 has a BYTE_STREAM_SPLIT encoding which is not currently directly usable by Spark (as there is no way to enable writing of this encoding). However, other engines may write a file with this encoding and the vectorized reader should be able to consume it.
      A vectorized version of this encoding should be implemented

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            parthc Parth Chandra

            Dates

              Created:
              Updated:

              Slack

                Issue deployment