[CALCITE-853] EnumerableAggregate should take advantage of input collation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.23.0
Component/s: None
Labels:
None

Description

Li Yang <liyang@apache.org>
Aug 20 (2 days ago)

I encountered Out Of Mem exception when a huge result set is passed into EnumerableAggregate and get aggregated in memory. I'm thinking if the input is sorted by the group-by key, then the groupBy() don't have to hold all data in memory any more.

Julian Hyde <jhyde@apache.org>
2:20 PM (16 hours ago)

Yes, that would be useful. Please log a jira.

Enumerable.groupBy doesn't know its input's collation so can't make that decision, but EnumerableAggregate does. I think that EnumerableAggregate should have a "trigger key", a subset of its group key, and if the trigger key changes it will emit and flush its hash table.

As well as for your use case, it will be useful for streaming queries.

Attachments

Issue Links

is related to

CALCITE-2540 Streaming Sort relational operator

Open

relates to

CALCITE-784 LogicalAggregate's create method discards any collation traits from input

Closed

Activity

People

Assignee:: Unassigned

Reporter:: liyang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Aug/15 22:58

Updated:: 29/May/21 00:43

Resolved:: 29/May/21 00:43