Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14257

CBO: Push Join through Groupby to trigger shuffle reductions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • CBO
    • None

    Description

      Similar to the optimizations in hive, already which push aggregates through a join (hive.transpose.aggr.join=true).

      select count(v) from (select d_year, count(ss_item_sk) as v from store_sales, date_dim where ss_sold_date_sk=d_Date_sk group by d_year) w, date_dim d where d.d_year = w.d_year and d_date_sk = 1;
      

      currently produces an entire aggregate of all years before discarding all of that (because obviously, there's no data for d_date_sk=1;

      This particular example is a simplified version of TPC-DS Query59's join condition, which can have a reduction in scans by applying the d_month_seq between 1185 and 1185 + 11 into the wss alias.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: