Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2108

Improve partition pruning by extracting partition-column filters from non-trivial disjunctions.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala 2.2
    • None
    • Frontend

    Description

      Problem Statement
      Impala fails to prune partitions if the partition-column filters are part of a "non-trivial" disjunction where each disjunct itself consists of conjuncts referencing both partition and non-partition columns.

      Consider the following example:

      create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
      [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
      Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial djsiunctions
      +----------------------------------------------------------------------------------------+
      | Explain String                                                                         |
      +----------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=176.00MB VCores=1                              |
      | WARNING: The following tables are missing relevant table and/or column statistics.     |
      | default.test_table                                                                     |
      |                                                                                        |
      | 01:EXCHANGE [UNPARTITIONED]                                                            |
      | |                                                                                      |
      | 00:SCAN HDFS [default.test_table]                                                      |
      |    partitions=5/5 files=9 size=36B                                                     |
      |    predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 AND c2 = 'c') |
      +----------------------------------------------------------------------------------------+
      Fetched 9 row(s) in 0.04s
      [localhost.localdomain:21000] > 
      

      Cause
      This is a limitation in how Impala filters partitions.

      Workaround
      The above example can be fixed by manually rewriting the predicate as follows:

      select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
      

      Proposed fix
      The proposed fix is for Impala to automatically do what is stated in the workaround above:
      Extract the partition-column filters from the disjunctions, create a new predicate with all those partition-column filters connected with OR, and add the new predicate to the original one with AND.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alex.behm Alexander Behm
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: