Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25758

OOM due to recursive application of CBO rules

    XMLWordPrintableJSON

Details

    Description

       

      Reproducing query is as follows:

      create table test1 (act_nbr string);
      create table test2 (month int);
      create table test3 (mth int, con_usd double);
      
      EXPLAIN
         SELECT c.month,
                d.con_usd
         FROM
           (SELECT cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
            FROM test1
            UNION ALL
            SELECT month
            FROM test2
            WHERE month = 202110) c
         JOIN test3 d ON c.month = d.mth; 

       

      Different plans are generated during the first CBO steps, last being:

      2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main] parse.CalcitePlanner: Plan after decorre
      lation:
      HiveProject(month=[$0], con_usd=[$2])
        HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(month=[$0])
            HiveUnion(all=[true])
              HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-d
      d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-
      16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER])
                HiveTableScan(table=[[default, test1]], table:alias=[test1])
              HiveProject(month=[$0])
                HiveFilter(condition=[=($0, CAST(202110):INTEGER)])
                  HiveTableScan(table=[[default, test2]], table:alias=[test2])
          HiveTableScan(table=[[default, test3]], table:alias=[d])

       

      Then, the HEP planner will keep expanding the filter expression with redundant expressions, such as the following, where the identical CAST expression is present multiple times:

       

      rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, 202110))

       

      The problem seems to come from a bad interaction of at least HiveFilterProjectTransposeRule and HiveJoinPushTransitivePredicatesRule, possibly more.

      Most probably then UNION part can be removed and the reproducer be simplified even further.

       

      Attachments

        Issue Links

          Activity

            People

              asolimando Alessandro Solimando
              asolimando Alessandro Solimando
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m