Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5947

First class partitioning support in data sources API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.4.0
    • SQL
    • None

    Description

      For file system based data sources, implementing Hive style partitioning support can be complex and error prone. To be specific, partitioning support include:

      1. Partition discovery: Given a directory organized similar to Hive partitions, discover the directory structure and partitioning information automatically, including partition column names, data types, and values.
      2. Reading from partitioned tables
      3. Writing to partitioned tables

      It would be good to have first class partitioning support in the data sources API. For example, add a FileBasedScan trait with callbacks and default implementations for these features.

      Attachments

        Activity

          People

            marmbrus Michael Armbrust
            lian cheng Cheng Lian
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: