Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32544

Bucketing and Partitioning information are not passed on to non FileFormat datasource writes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Input/Output
    • None

    Description

      When writing to a FileFormat datasource, bucket spec and partition columns are passed into InsertIntoHadoopFsRelationCommand: https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L474-L475.

       

      However, from what I can tell, the RelationProvider API does not have a way to pass in these: https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L511-L513

      Attachments

        Activity

          People

            Unassigned Unassigned
            rahij Rahij Ramsharan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: