Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-90

Right outer Join includes NULLs as well and hence wrong result count

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 0.3
    • Impala 1.0
    • None

    Description

      When using the right outer join, the result set is showing rows that are duplicates and with NULLs. For eg:

      The left table looks like this (first column is the join_col)
      A ---- 1
      B ---- 2
      C ---- 3

      right table looks like (first column is the join_col)
      A
      B
      D
      E

      The result:

      A — 1
      A — NULL
      B — 2
      B — NULL
      D — NULL
      E — NULL

      The Impala query -

      insert into table join_impala 
      select distinct b.master_id as master_id, b.master_tk as master_tk, a.master_num as tk_master_num from std_clnt_master_tk a right outer join prepare_master b on (a.master_tk = b.master_tk); 
      

      customer: "# of rows ~ 1.9 million (wrong)"

      In hive the query used works fine:

      insert into table join_hive 
      select distinct b.master_id as master_id, b.master_tk as master_tk, a.master_num as tk_master_num from std_clnt_master_tk a right outer join prepare_master b on (a.master_tk = b.master_tk); 
      

      customer: "# of rows ~ 1.8 million (correct answer, verified using an SQL database as well)"

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            marcelk Marcel Kinard
            marcelk Marcel Kinard
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment