Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-495

Views Sometimes Not Utilizing Partition Pruning

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.1
    • Impala 1.1.1
    • None
    • None
    • - Partitioned Table
      - Impala 1.1
      - CentOS 5.9

    Description

      It seems that queries which utilize partition pruning using a nested WHERE are misbehaving when in a VIEW. Not only does it seem the VIEW isn't taking advantage of partition pruning, but it's also returning an incorrect result. See the following example:

      Partitions:
      customerUUID
      collectionTS

      Just The Query #1:

      SELECT COUNT(*) FROM tbl1
      WHERE customerUUID='AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE'
      AND (clusterName = 'Cluster 1 - CDH4' OR clusterName = 'null')
      AND collectionTS=1375104984000 AND service='zookeeper1'
      AND role='zookeeper1-SERVER-9'
      AND ts >= 1374891637995
      AND ts <= 1375103039311;
      

      Pruning

      ...
        0:SCAN HDFS
           table=default.tbl1 #partitions=1 size=71.73MB
      ...
      

      Just The Query #2

      SELECT COUNT(*) FROM tbl1
      WHERE customerUUID='AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE'
      AND clusterName IN ('Cluster 1 - CDH4', 'null')
      AND collectionTS=1375104984000 AND service='zookeeper1'
      AND role='zookeeper1-SERVER-9'
      AND ts >= 1374891637995
      AND ts <= 1375103039311;
      

      Pruning

        0:SCAN HDFS
           table=default.tbl1 #partitions=1 size=71.73MB
      

      Query #1 With View

      CREATE VIEW tbl1_view AS SELECT message, level, class, ts FROM tbl1
      where customerUUID='AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE'
      AND (clusterName = 'Cluster 1 - CDH4' OR clusterName = 'null')
      AND collectionTS=1375104984000 AND service='zookeeper1'
      AND role='zookeeper1-SERVER-9'
      AND ts >= 1374891637995
      AND ts <= 1375103039311;
      

      Pruning

        0:SCAN HDFS
           table=default.tbl1 #partitions=2652 size=852.35GB
      

      Doesn't seem to like (clusterName = 'Cluster 1 - CDH4' OR clusterName = 'null')

      Query #2 With View

      CREATE VIEW tbl1_view AS SELECT message, level, class, ts FROM tbl1
      where customerUUID='AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE'
      AND clusterName IN ('Cluster 1 - CDH4', 'null')
      AND collectionTS=1375104984000 AND service='zookeeper1'
      AND role='zookeeper1-SERVER-9'
      AND ts >= 1374891637995
      AND ts <= 1375103039311;
      

      Pruning

        0:SCAN HDFS
           table=default.tbl1 #partitions=1 size=71.73MB
      

      Attachments

        Activity

          People

            marcelk Marcel Kinard
            rickysaltzer Ricky Saltzer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: