Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17861

Store data source partitions in metastore and push partition pruning into metastore

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Initially, Spark SQL does not store any partition information in the catalog for data source tables, because initially it was designed to work with arbitrary files. This, however, has a few issues for catalog tables:

      1. Listing partitions for a large table (with millions of partitions) can be very slow during cold start.
      2. Does not support heterogeneous partition naming schemes.
      3. Cannot leverage pushing partition pruning into the metastore.

      This ticket tracks the work required to push the tracking of partitions into the metastore. This change should be feature flagged.

        Attachments

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

              People

              • Assignee:
                ekhliang Eric Liang
                Reporter:
                rxin Reynold Xin
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: