Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-9
Description
TPC-DS Q67 would benefit significantly if we could push the rank() predicate into the sort to do some reduction of unneeded data. The sorter could evaluate this predicate if it had the partition expressions available - as a post-processing step to the in-memory sort for the analytic sort group, it could do a pass over the sorted run, resetting a counter at the start of each partition boundary.
It might be best to start with tackling IMPALA-3471 by applying the limit within sorted runs, since that doesn't require any planner work.
with results as ( select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales from store_sales ,date_dim ,store ,item where ss_sold_date_sk=d_date_sk and ss_item_sk=i_item_sk and ss_store_sk = s_store_sk and d_month_seq between 1212 and 1212 + 11 group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,s_store_id) , results_rollup as (select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, s_store_id, sumsales from results union all select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy union all select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy union all select i_category, i_class, i_brand, i_product_name, d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category, i_class, i_brand, i_product_name, d_year union all select i_category, i_class, i_brand, i_product_name, null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category, i_class, i_brand, i_product_name union all select i_category, i_class, i_brand, null i_product_name, null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category, i_class, i_brand union all select i_category, i_class, null i_brand, null i_product_name, null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category, i_class union all select i_category, null i_class, null i_brand, null i_product_name, null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results group by i_category union all select null i_category, null i_class, null i_brand, null i_product_name, null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales from results) select * from (select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id ,sumsales ,rank() over (partition by i_category order by sumsales desc) rk from results_rollup) dw2 where rk <= 100 order by i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id ,sumsales ,rk limit 100
Assigning to myself to fill in more details.
Attachments
Issue Links
- is related to
-
IMPALA-9951 Skew in analytic sorts when partition key has low cardinality
- Open
-
IMPALA-10228 Avoid or codegen std::map comparisons in partitioned top-n
- Open
-
IMPALA-2783 Push down filters on rank similar to limit
- Resolved
- relates to
-
IMPALA-3471 TopN should be able to spill
- Open
-
IMPALA-9983 Push limit from a top level sort onto analytic sort when applicable
- Resolved
1.
|
Backend partitioned top-n operator | Resolved | Tim Armstrong |