Currently OPTIQ doesn't push down aggregates for bushy joins and this can be a performance limitation in some queries.
Ideally before we push down the aggregate we should take into the account the NDV and reduction of the join which we are trying to push the aggregate underneath.
On Hive to measure potential gains I wrote a query which has the following business question based on the TPC-DS schema:
Identify customers from specific gender with good credit rating that spent more than X amount of money during a specific year.
This query has 3 main components
1) Hub containing store_sales & date_dim
2) Hub containing customer and customer_demographics
And for that query I have 2 rewrites :
1) Rewrite that generates a bushy join plan
2) Rewrite that generates a bushy join plan and pushes the aggregate down
Results show that aggregate push down in a bushy plan cuts down the query runtime by half.
The aggregate push down reduces the intermediate data in the query from 384GB to 22GB.
Base query finishes in 4,781 seconds with 2,253 GB of intermediate data
Bushy query finished in 1,374 seconds with 384GB of intermediate data
Bushy query + aggregate push down finished in 605 with only 22GB of intermediate data.
Bushy query + aggregate push down