Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26135

Invalid Anti join conversion may cause missing results

    XMLWordPrintableJSON

Details

    Description

      right now I think the following is needed to trigger the issue:

      • left outer join
      • only select left hand side columns
      • conditional which is using some udf
      • the nullness of the udf is checked

      repro sql; in case the conversion happens the row with 'a' will be missing

      drop table if exists t;
      drop table if exists n;
      
      create table t(a string) stored as orc;
      create table n(a string) stored as orc;
      
      insert into t values ('a'),('1'),('2'),(null);
      insert into n values ('a'),('b'),('1'),('3'),(null);
      
      
      explain select n.* from n left outer join t on (n.a=t.a) where assert_true(t.a is null) is null;
      explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null;
      
      
      select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null;
      set hive.auto.convert.anti.join=false;
      select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null;
      
      

      resultset with hive.auto.convert.anti.join enabled:

      +------+
      | n.a  |
      +------+
      | b    |
      | 3    |
      +------+
      

      correct resultset with hive.auto.convert.anti.join disabled:

      +-------+
      |  n.a  |
      +-------+
      | a     |
      | b     |
      | 3     |
      | NULL  |
      +-------+
      

      workaround could be to disable the feature:

      set hive.auto.convert.anti.join=false;
      

      Attachments

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m