Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14172

Hive table partition predicate not passed down correctly

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • 1.6.1
    • None
    • SQL

    Description

      When the hive sql contains nondeterministic fields, spark plan will not push down the partition predicate to the HiveTableScan. For example:

      -- consider following query which uses a random function to sample rows
      SELECT *
      FROM table_a
      WHERE partition_col = 'some_value'
      AND rand() < 0.01;
      

      The spark plan will not push down the partition predicate to HiveTableScan which ends up scanning all partitions data from the table.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            yingjizhang Yingji Zhang
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment