Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2108

Improve partition pruning by extracting partition-column filters from non-trivial disjunctions.

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala 2.2
    • Fix Version/s: None
    • Component/s: Frontend

      Description

      Problem Statement
      Impala fails to prune partitions if the partition-column filters are part of a "non-trivial" disjunction where each disjunct itself consists of conjuncts referencing both partition and non-partition columns.

      Consider the following example:

      create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
      [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
      Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial djsiunctions
      +----------------------------------------------------------------------------------------+
      | Explain String                                                                         |
      +----------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=176.00MB VCores=1                              |
      | WARNING: The following tables are missing relevant table and/or column statistics.     |
      | default.test_table                                                                     |
      |                                                                                        |
      | 01:EXCHANGE [UNPARTITIONED]                                                            |
      | |                                                                                      |
      | 00:SCAN HDFS [default.test_table]                                                      |
      |    partitions=5/5 files=9 size=36B                                                     |
      |    predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 AND c2 = 'c') |
      +----------------------------------------------------------------------------------------+
      Fetched 9 row(s) in 0.04s
      [localhost.localdomain:21000] > 
      

      Cause
      This is a limitation in how Impala filters partitions.

      Workaround
      The above example can be fixed by manually rewriting the predicate as follows:

      select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
      

      Proposed fix
      The proposed fix is for Impala to automatically do what is stated in the workaround above:
      Extract the partition-column filters from the disjunctions, create a new predicate with all those partition-column filters connected with OR, and add the new predicate to the original one with AND.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharathv bharath v
                Reporter:
                alex.behm Alexander Behm
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: