Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-860

Inner join after right join may produce wrong results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 1.3
    • Impala 1.3
    • None

    Description

      I only tried this on master, 42db599d3485527db80483004ba6a73cb891d0d2.

      Please see the queries below. The bottom one has straight_join set and produces results that match mysql.

      [localhost:21000] > select count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col;
      Query: select count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col
      10000
      Returned 1 row(s) in 0.14s
      
      
      [localhost:21000] > explain select count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col;
      Query: explain select count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col
      Estimated Per-Host Requirements: Memory=2.17GB VCores=2
      ""
      10:AGGREGATE [MERGE FINALIZE]
      |  output: sum(count(*))
      |
      09:EXCHANGE [PARTITION=UNPARTITIONED]
      |
      05:AGGREGATE
      |  output: count(*)
      |
      04:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
      |  hash predicates: t1.int_col = t2.int_col
      |
      |--08:EXCHANGE [PARTITION=HASH(t2.int_col)]
      |  |
      |  01:SCAN HDFS [functional.alltypesagg t2]
      |     partitions=10/10 size=743.67KB
      |
      07:EXCHANGE [PARTITION=HASH(t1.int_col)]
      |
      03:HASH JOIN [INNER JOIN, BROADCAST]
      |  hash predicates: t1.string_col = t3.a
      |
      |--06:EXCHANGE [BROADCAST]
      |  |
      |  02:SCAN HDFS [functional.tinytable t3]
      |     partitions=1/1 size=38B
      |
      00:SCAN HDFS [functional.alltypes t1]
         partitions=24/24 size=478.45KB
      Returned 30 row(s) in 0.02s
      
      
      [localhost:21000] > select straight_join count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col;
      Query: select straight_join count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col
      0
      Returned 1 row(s) in 0.14s
      
      
      [localhost:21000] > explain select straight_join count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col;
      Query: explain select straight_join count(*) from alltypes t1 right join alltypesagg t2 on t2.int_col = t1.int_col join tinytable t3 on t3.a = t1.string_col
      Estimated Per-Host Requirements: Memory=2.17GB VCores=3
      ""
      10:AGGREGATE [MERGE FINALIZE]
      |  output: sum(count(*))
      |
      09:EXCHANGE [PARTITION=UNPARTITIONED]
      |
      05:AGGREGATE
      |  output: count(*)
      |
      04:HASH JOIN [INNER JOIN, BROADCAST]
      |  hash predicates: t1.string_col = t3.a
      |
      |--08:EXCHANGE [BROADCAST]
      |  |
      |  02:SCAN HDFS [functional.tinytable t3]
      |     partitions=1/1 size=38B
      |
      03:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
      |  hash predicates: t1.int_col = t2.int_col
      |
      |--07:EXCHANGE [PARTITION=HASH(t2.int_col)]
      |  |
      |  01:SCAN HDFS [functional.alltypesagg t2]
      |     partitions=10/10 size=743.67KB
      |
      06:EXCHANGE [PARTITION=HASH(t1.int_col)]
      |
      00:SCAN HDFS [functional.alltypes t1]
         partitions=24/24 size=478.45KB
      Returned 30 row(s) in 0.02s
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            caseyc casey
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: