Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol) case: if there's only a single COUNT(DISTINCT pkCol) and the GroupBy ends up being order preserving, you can replace the COUNT(DISTINCT pkCol) with a COUNT(pkCol) in the SELECT, HAVING, and ORDER BY clauses. That'll prevent the DistinctValueWithCountServerAggregator from being used which keeps a Map of all unique values and instead just keep a single overall count, which is all we need thanks to your DistinctPrefixFilter.
A few considerations in the implementation:
- Pass through select in the call to groupBy.compile() in QueryCompiler and change the return type to return a new select (as the SELECT, HAVING, and ORDER BY may have been rewritten). Probably easiest if the GroupBy object is just mutated in place.
- Within the groupBy.compile() call, use a visitor on the SELECT, HAVING and ORDER BY clauses to do the rewriting. You can do that by deriving a class from ParseNodeRewriter, overriding the visitLeave(final FunctionParseNode node, List<ParseNode> nodes) method to return a new COUNT parse node with the nodes passed in as children if node equals the DistinctCountParseNode that you replaced in the select statement.
- The compilation of the HAVING clause should be moved after the call to groupBy compile in QueryCompiler, like this since it may have been rewritten in the groupBy.compile call:
select = groupBy.compile(context, select, innerPlanTupleProjector); Expression having = HavingCompiler.compile(context, select, groupBy);