Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5947

First class partitioning support in data sources API

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      For file system based data sources, implementing Hive style partitioning support can be complex and error prone. To be specific, partitioning support include:

      1. Partition discovery: Given a directory organized similar to Hive partitions, discover the directory structure and partitioning information automatically, including partition column names, data types, and values.
      2. Reading from partitioned tables
      3. Writing to partitioned tables

      It would be good to have first class partitioning support in the data sources API. For example, add a FileBasedScan trait with callbacks and default implementations for these features.

        Attachments

          Activity

            People

            • Assignee:
              marmbrus Michael Armbrust
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: