Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24999

HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple correlations

    XMLWordPrintableJSON

Details

    Description

      The problem can be reproduced by using the following query which at the moment can be found in subquery_in.q file:

      explain cbo select * from part where p_name IN (select p_name from part p where p.p_size = part.p_size AND part.p_size + 121150 = p.p_partkey );
      

      The plans before and after HiveSubQueryRemoveRule are shown below:

      2021-04-09T14:29:08,031 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] parse.CalcitePlanner: Plan before removing subquery:
      HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8])
        HiveFilter(condition=[IN($1, {
      HiveProject(p_name=[$1])
        HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), $0))])
          HiveTableScan(table=[[default, part]], table:alias=[p])
      })])
          HiveTableScan(table=[[default, part]], table:alias=[part])
      
      2021-04-09T14:29:08,056 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] parse.CalcitePlanner: Plan just after removing subquery:
      HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8])
        HiveFilter(condition=[=($1, $12)])
          LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{5}])
            HiveTableScan(table=[[default, part]], table:alias=[part])
            HiveProject(p_name=[$1])
              HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), $0))])
                HiveTableScan(table=[[default, part]], table:alias=[p])
      

      The plan after applying the rule is invalid. The HiveFilter(condition=[=($1, $12)]) above the correlate references columns ($12) from the right input which do not exist since the correlate is of type SEMI. Running the test with -Dcalcite.debug property enabled raises an AssertionError when building the HiveFilter.

      The problem is hidden at the moment since there is a specific hack in HiveRelDecorrelator that turns this invalid plan into a valid one. This mechanism is very brittle and it can break easily as it happened while fixing HIVE-24957.

      Attachments

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m