Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-90

Right outer Join includes NULLs as well and hence wrong result count

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 0.3
    • Impala 1.0
    • None

    Description

      When using the right outer join, the result set is showing rows that are duplicates and with NULLs. For eg:

      The left table looks like this (first column is the join_col)
      A ---- 1
      B ---- 2
      C ---- 3

      right table looks like (first column is the join_col)
      A
      B
      D
      E

      The result:

      A — 1
      A — NULL
      B — 2
      B — NULL
      D — NULL
      E — NULL

      The Impala query -

      insert into table join_impala 
      select distinct b.master_id as master_id, b.master_tk as master_tk, a.master_num as tk_master_num from std_clnt_master_tk a right outer join prepare_master b on (a.master_tk = b.master_tk); 
      

      customer: "# of rows ~ 1.9 million (wrong)"

      In hive the query used works fine:

      insert into table join_hive 
      select distinct b.master_id as master_id, b.master_tk as master_tk, a.master_num as tk_master_num from std_clnt_master_tk a right outer join prepare_master b on (a.master_tk = b.master_tk); 
      

      customer: "# of rows ~ 1.8 million (correct answer, verified using an SQL database as well)"

      Attachments

        Activity

          People

            marcelk Marcel Kinard
            marcelk Marcel Kinard
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: