[IMPALA-2932] Extend DistributedPlanner::createHashJoinFragment to account for hash table build cost - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 2.5.0
Fix Version/s: Impala 2.7.0
Component/s: Frontend
Labels:
- performance
- planner

Target Version:

Impala 2.7.0

Description

Currently the decision to do a shuffle Vs. broadcast join doesn't take into account the CPU/memory cost for the hash table build.
The lack of cost for hash table build results in large broadcast joins on big clusters which ends up being less efficient than a shuffle join.

Other factors that need to be taken into account are :

Query memory limit (from runtime or MEM_LIMIT)
Number of concurrent fragments in the query and memory requirement by each fragment
Number of broadcast joins and shuffle joins in the current fragment

This query can be used to verify the fix

select d_year AS cs_sold_year,
    sum(cs_quantity) cs_qty,
    sum(cs_wholesale_cost) cs_wc,
    sum(cs_sales_price) cs_sp
   from catalog_sales
   left  join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
   join date_dim on cs_sold_date_sk = d_date_sk
   where cr_order_number is null
   group by d_year

Where the baseline is

select /* +straight_join */ d_year AS cs_sold_year,
    sum(cs_quantity) cs_qty,
    sum(cs_wholesale_cost) cs_wc,
    sum(cs_sales_price) cs_sp
   from catalog_sales
   left  join  /* +shuffle */   catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
   join date_dim on cs_sold_date_sk = d_date_sk
   where cr_order_number is null
   group by d_year