Details
-
Improvement
-
Status: In Progress
-
Minor
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
Currently, RewriteDistinctAggregates rule will rewrite distinct aggregates with extra Expand node. This causes unnecessary memory waste and performance fallback for some specific aggregates(e.g. count distinct case when).
We can rewrite count distinct case when aggregates without Expand:
for query:
- {{
{
* SELECT
* cat1,
* COUNT(DISTINCT CASE WHEN cond1 THEN cat2 ELSE null end) as cat2_cnt1,
* COUNT(DISTINCT CASE WHEN cond2 THEN cat2 ELSE null end) as cat2_cnt2,
* FROM
* data
* GROUP BY
* key
* }
}}
we should rewrite to :
- {{
{
* Aggregate(
* key = ['key]
* functions = [count(if (01 & 'bit_vector != 0) 0 else null),
* count(if (10 & 'bit_vector != 0) 0 else null)]
* output = ['key, 'cat2_cnt1, 'cat2_cnt2])
* Aggregate(
* key = ['key, 'cat1]
* functions = [bit_or(if (cond1) 01 else 00, if (cond2) 10 else 00)]
* output = ['key, 'cat1, 'bit_vector])
* LocalTableScan [...]
* }
}}
This method will improve performance and reduce memory waste