Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2237 Add hash-based Aggregation
  3. FLINK-3479

Add hash-based strategy for CombineFunction

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Do
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Runtime / Task
    • Labels:
      None

      Description

      This issue is similar to FLINK-3477 but adds a hash-based strategy for CombineFunction instead of ReduceFunction.

      The interface of CombineFunction differs from ReduceFunction by providing an Iterable<T> instead of two T values. Hence, if the Iterable<T> provides two values, we can do the same as with a ReduceFunction.

      At the moment, CombineFunction is wrapped in a GroupCombineFunction and hence executed using the GroupReduceCombineDriver.
      We should add dedicated two dedicated drivers: CombineDriver and ChainedCombineDriver and two driver strategies: HASH_COMBINE and SORT_COMBINE.

      If FLINK-3477 is resolved, we can reuse the hash-table.

      We should also add compiler hints to `DataSet.reduceGroup()` and `Grouping.reduceGroup()` to allow users to select between a SORT and HASH based combine strategies (HASH will only be applicable to CombineFunction and not GroupCombineFunction).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              fhueske Fabian Hueske
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: