Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16869

Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 3.0.0
    • None
    • None

    Description

      When hive.optimize.ppd and hive.optimize.index.filter are turned, and a select query has a condition on a column that doesn't exist in Parquet file (such as a partition column), Hive often returns wrong result.

      Please see below example for details:

      hive> create table test_parq (a int, b int) partitioned by (p int) stored as parquet;
      OK
      Time taken: 0.292 seconds
      hive> insert overwrite table test_parq partition (p=1) values (1, 2);
      OK
      Time taken: 5.08 seconds
      hive> select * from test_parq where a=1 and p=1;
      OK
      1	2	1
      Time taken: 0.441 seconds, Fetched: 1 row(s)
      hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
      OK
      1	2	1
      Time taken: 0.197 seconds, Fetched: 1 row(s)
      hive> set hive.optimize.index.filter=true;
      hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
      OK
      Time taken: 0.167 seconds
      hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1);
      OK
      Time taken: 0.563 seconds
      

      Attachments

        1. HIVE-16869.1.patch
          8 kB
          Yibing Shi
        2. HIVE-16869.2.patch
          8 kB
          Yibing Shi

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Yibing Yibing Shi Assign to me
            Yibing Yibing Shi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment