Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1212

Remove extra exchange operator when generate a two phase aggregation plan.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      We may see an extra hash exchange operator, when optimizer choose a two phase aggregation plan, for a query with group by GB_Key order by OB_Key and GB_key is same as Ob_key.

      One example of such plan with extra exchange is TPCH Q12:

      select
      l.l_shipmode,
      sum(case
      when o.o_orderpriority = '1-URGENT'
      or o.o_orderpriority = '2-HIGH'
      then 1
      else 0
      end) as high_line_count,
      sum(case
      when o.o_orderpriority <> '1-URGENT'
      and o.o_orderpriority <> '2-HIGH'
      then 1
      else 0
      end) as low_line_count
      from
      cp.`tpch/orders.parquet` o,
      cp.`tpch/lineitem.parquet` l
      where
      o.o_orderkey = l.l_orderkey
      and l.l_shipmode in ('TRUCK', 'REG AIR')
      and l.l_commitdate < l.l_receiptdate
      and l.l_shipdate < l.l_commitdate
      and l.l_receiptdate >= date '1994-01-01'
      and l.l_receiptdate < date '1994-01-01' + interval '1' year
      group by
      l.l_shipmode
      order by
      l.l_shipmode;

      The issue is caused by the rule does not propagate the hash distribution up, and hence introduce unnecessary hash exchange.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jni Jinfeng Ni
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: