Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9162

Incorrect redundant predicate applied to outer join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.4.0
    • Frontend
    • ghx-label-14

    Description

      Run the attached create.sql script to create the tables and view. The following query shows an incorrect redundant predicate applied to the outer join. This seems another variant of past issues such as IMPALA-7957 and IMPALA-8386.

      // Has a redundant predicate as 'Other predicates' on Outer Join
      
      Query: explain select x.* from (select v1.c3, v1.max_c3 from v.t2 left join v.v1 on  t2.c2=v1.c3) as x
                                                                                        
       06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
         hash predicates: c3 = t2.c2
         other predicates: c3 = max(c3) .    <-- WRONG
         runtime filters: RF000 <- t2.c2
         row-size=20B cardinality=397
                         
       --13:EXCHANGE [HASH(t2.c2)]                          
                                                                          
         00:SCAN HDFS [v.t2]                                            
            HDFS partitions=1/1 files=1 size=639B
            row-size=4B cardinality=397                               
                                                                                           
       12:EXCHANGE [HASH(c3)]                                                              
                                                                                           
       05:HASH JOIN [INNER JOIN, BROADCAST]                                                
         hash predicates: c3 = max(c3)                                                     
         runtime filters: RF002 <- max(c3)                                                 
         row-size=16B cardinality=207       
      

      By comparison, the following query which does not have the v1.max_c3 column in the SELECT list produces the correct plan:

      // Does not have the redundant predicate
      
      Query: explain select x.* from (select v1.c3 from v.t2 left join v.v1 on  t2.c2=v1.c3) as x
      
       06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
         hash predicates: c3 = t2.c2
         runtime filters: RF000 <- t2.c2
         row-size=20B cardinality=397
      
       --13:EXCHANGE [HASH(t2.c2)]
      
         00:SCAN HDFS [v.t2]
            HDFS partitions=1/1 files=1 size=639B
            row-size=4B cardinality=397
      
       12:EXCHANGE [HASH(c3)]
      
       05:HASH JOIN [INNER JOIN, BROADCAST]
         hash predicates: c3 = max(c3)
         runtime filters: RF002 <- max(c3)
         row-size=16B cardinality=207
      
      

      Due the redundant predicate, the first query produces wrong results.

      Attachments

        1. create.sql.txt
          0.8 kB
          Aman Sinha

        Issue Links

          Activity

            People

              amansinha Aman Sinha
              amansinha Aman Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: