Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-853

EnumerableAggregate should take advantage of input collation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.23.0
    • Component/s: None
    • Labels:
      None

      Description

      Li Yang <liyang@apache.org>
      Aug 20 (2 days ago)

      I encountered Out Of Mem exception when a huge result set is passed into EnumerableAggregate and get aggregated in memory. I'm thinking if the input is sorted by the group-by key, then the groupBy() don't have to hold all data in memory any more.

      Julian Hyde <jhyde@apache.org>
      2:20 PM (16 hours ago)

      Yes, that would be useful. Please log a jira.

      Enumerable.groupBy doesn't know its input's collation so can't make that decision, but EnumerableAggregate does. I think that EnumerableAggregate should have a "trigger key", a subset of its group key, and if the trigger key changes it will emit and flush its hash table.

      As well as for your use case, it will be useful for streaming queries.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              liyang.gmt8@gmail.com liyang

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment