Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27589 Spark file source V2
  3. SPARK-32935

File source V2: support bucketing

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      Datasource V2 does not currently support bucketed reads or writes similar to Datasource V1 does.  See DatasourceScanExec and config 

      spark.sql.sources.bucketing.enabled.  We need to add support to V2 as well.

       

      Support writing file data source with bucketing looks like:

       
      fileDf.write.bucketBy(...).sortBy(..)... 
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated: