Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently if a query has COUNT(DISTINCT x) and COUNT(DISTINCT y) we compute the distinct counts separately and combine them using a join. The join isn't too expensive (because usually the GROUP BY has only a few keys) but we make multiple scans over the base table.
I think we could translate multiple distinct-counts into a GROUPING SETS query (i.e. an Aggregate with more than one element in the groupSets field). If the underlying engine can evaluate that efficiently, then we have saved ourselves a join and several scans.
Attachments
Issue Links
- relates to
-
CALCITE-6332 Optimization CoreRules.AGGREGATE_EXPAND_DISTINCT_AGGREGATES_TO_JOIN produces incorrect results for aggregates with groupSets
- Closed