[HIVE-21382] Group by keys reduction optimization - keys are not reduced in query23 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: None
Labels:
- pull-request-available

Target Version/s:

4.0.0

Description

explain cbo with frequent_ss_items as 
 (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt
  from store_sales
      ,date_dim 
      ,item
  where ss_sold_date_sk = d_date_sk
    and ss_item_sk = i_item_sk 
    and d_year in (1999,1999+1,1999+2,1999+3)
  group by substr(i_item_desc,1,30),i_item_sk,d_date
  having count(*) >4)
select  sum(sales)
 from ((select cs_quantity*cs_list_price sales
       from catalog_sales
           ,date_dim 
       where d_year = 1999 
         and d_moy = 1 
         and cs_sold_date_sk = d_date_sk 
         and cs_item_sk in (select item_sk from frequent_ss_items))) subq limit 100;

HiveSortLimit(fetch=[100])
  HiveProject($f0=[$0])
    HiveAggregate(group=[{}], agg#0=[sum($0)])
      HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
        HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
          HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
            HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], cs_quantity=[$18], cs_list_price=[$20])
              HiveFilter(condition=[IS NOT NULL($0)])
                HiveTableScan(table=[[perf_constraints, catalog_sales]], table:alias=[catalog_sales])
            HiveProject(d_date_sk=[$0])
              HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
                HiveTableScan(table=[[perf_constraints, date_dim]], table:alias=[date_dim])
          HiveProject(i_item_sk=[$1])
            HiveFilter(condition=[>($3, 4)])
              HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
                HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
                  HiveJoin(condition=[=($1, $4)], joinType=[inner], algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
                    HiveJoin(condition=[=($0, $2)], joinType=[inner], algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
                      HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
                        HiveFilter(condition=[IS NOT NULL($0)])
                          HiveTableScan(table=[[perf_constraints, store_sales]], table:alias=[store_sales])
                      HiveProject(d_date_sk=[$0], d_date=[$2])
                        HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
                          HiveTableScan(table=[[perf_constraints, date_dim]], table:alias=[date_dim])
                    HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
                      HiveTableScan(table=[[perf_constraints, item]], table:alias=[item])

Right side of HiveSemiJoin has an aggregate which could be reduce to have only i_item_sk as group by key since i_item_sk is primary key.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-21382.1.patch
06/Mar/19 01:57
13 kB
Vineet Garg
HIVE-21382.2.patch
07/Mar/19 22:59
125 kB
Vineet Garg
HIVE-21382.2.patch
07/Mar/19 22:16
125 kB
Vineet Garg
HIVE-21382.3.patch
12/Mar/19 05:33
144 kB
Vineet Garg
HIVE-21382.4.patch
14/Mar/19 03:03
143 kB
Vineet Garg
HIVE-21382.5.patch
14/Mar/19 19:36
143 kB
Vineet Garg
HIVE-21382.6.patch
15/Mar/19 17:57
143 kB
Vineet Garg

Issue Links

Dependent

HIVE-21387 Wrong result for UNION query with GROUP BY consisting of PK columns

Closed

is related to

HIVE-21340 CBO: Prune non-key columns feeding into a SemiJoin

Closed

links to

GitHub Pull Request #567

Activity

People

Assignee:: Vineet Garg

Reporter:: Vineet Garg

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Mar/19 22:02

Updated:: 27/Feb/24 22:24

Resolved:: 16/Mar/19 00:22

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 10m