Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20331

Broaden support for Hive partition pruning predicate pushdown

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.3.0
    • SQL

    Description

      Spark 2.1 introduced scalable support for Hive tables with huge numbers of partitions. Key to leveraging this support is the ability to prune unnecessary table partitions to answer queries. Spark supports a subset of the class of partition pruning predicates that the Hive metastore supports. If a user writes a query with a partition pruning predicate that is not supported by Spark, Spark falls back to loading all partitions and pruning client-side. We want to broaden Spark's current partition pruning predicate pushdown capabilities.

      One of the key missing capabilities is support for disjunctions. For example, for a table partitioned by date, specifying with a predicate like

      date = 20161011 or date = 20161014

      will result in Spark fetching all partitions. For a table partitioned by date and hour, querying a range of hours across dates can be quite difficult to accomplish without fetching all partition metadata.

      The current partition pruning support supports only comparisons against literals. We can expand that to foldable expressions by evaluating them at planning time.

      We can also implement support for the "IN" comparison by expanding it to a sequence of "OR"s.

      This ticket covers those enhancements.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            michael Michael MacFadden
            michael Michael MacFadden
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment