Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2237 Add hash-based Aggregation
  3. FLINK-3479

Add hash-based strategy for CombineFunction

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Won't Do
    • None
    • None
    • Runtime / Task
    • None

    Description

      This issue is similar to FLINK-3477 but adds a hash-based strategy for CombineFunction instead of ReduceFunction.

      The interface of CombineFunction differs from ReduceFunction by providing an Iterable<T> instead of two T values. Hence, if the Iterable<T> provides two values, we can do the same as with a ReduceFunction.

      At the moment, CombineFunction is wrapped in a GroupCombineFunction and hence executed using the GroupReduceCombineDriver.
      We should add dedicated two dedicated drivers: CombineDriver and ChainedCombineDriver and two driver strategies: HASH_COMBINE and SORT_COMBINE.

      If FLINK-3477 is resolved, we can reuse the hash-table.

      We should also add compiler hints to `DataSet.reduceGroup()` and `Grouping.reduceGroup()` to allow users to select between a SORT and HASH based combine strategies (HASH will only be applicable to CombineFunction and not GroupCombineFunction).

      Attachments

        Activity

          People

            Unassigned Unassigned
            fhueske Fabian Hueske
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: