Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17148

Incorrect result for Hive join query with COALESCE in WHERE condition

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 3.0.0
    • Component/s: CBO
    • Labels:
      None

      Description

      The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

      STEPS TO REPRODUCE:

      Step 1: Create a table ct1
      create table ct1 (a1 string,b1 string);
      
      Step 2: Create a table ct2
      create table ct2 (a2 string);
      
      Step 3 : Insert following data into table ct1
      insert into table ct1 (a1) values ('1');
      
      Step 4 : Insert following data into table ct2
      insert into table ct2 (a2) values ('1');
      
      Step 5 : Execute the following query 
      select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
      

      ACTUAL RESULT:

      The query returns nothing;
      

      EXPECTED RESULT:

      1       NULL    1
      

      The issue seems to be because of the incorrect query plan. In the plan we can see:
      predicate:(a1 is not null and b1 is not null)
      which does not look correct. As a result, it is filtering out all the rows is any column mentioned in the COALESCE has null value.
      Please find the query plan below:

      Plan optimized by CBO.
      
      Vertex dependency in root stage
      Map 1 <- Map 2 (BROADCAST_EDGE)
      
      Stage-0
        Fetch Operator
          limit:-1
          Stage-1
            Map 1
            File Output Operator [FS_10]
              Map Join Operator [MAPJOIN_15] (rows=1 width=4)
                Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
              <-Map 2 [BROADCAST_EDGE]
                BROADCAST [RS_7]
                  PartitionCols:_col0
                  Select Operator [SEL_5] (rows=1 width=1)
                    Output:["_col0"]
                    Filter Operator [FIL_14] (rows=1 width=1)
                      predicate:a2 is not null
                      TableScan [TS_3] (rows=1 width=1)
                        default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
              <-Select Operator [SEL_2] (rows=1 width=4)
                  Output:["_col0","_col1"]
                  Filter Operator [FIL_13] (rows=1 width=4)
                    predicate:(a1 is not null and b1 is not null)
                    TableScan [TS_0] (rows=1 width=4)
                      default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
      

      This happens only if join is inner type, otherwise HiveJoinAddNotRule which creates this problem is skipped.

        Attachments

        1. HIVE-17148.3.patch
          84 kB
          Vlad Gudikov
        2. HIVE-17148.2.patch
          41 kB
          Vlad Gudikov
        3. HIVE-17148.1.patch
          22 kB
          Vlad Gudikov
        4. HIVE-17148.patch
          4 kB
          Vlad Gudikov

          Activity

            People

            • Assignee:
              allgoodok Vlad Gudikov
              Reporter:
              allgoodok Vlad Gudikov
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: