Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2237 Add hash-based Aggregation
  3. FLINK-3477

Add hash-based combine strategy for ReduceFunction

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Implemented
    • None
    • None
    • Runtime / Task
    • None

    Description

      This issue is about adding a hash-based combine strategy for ReduceFunctions.
      The interface of the reduce() method is as follows:

      public T reduce(T v1, T v2)
      

      Input type and output type are identical and the function returns only a single value. A Reduce function is incrementally applied to compute a final aggregated value. This allows to hold the preaggregated value in a hash-table and update it with each function call.

      The hash-based strategy requires special implementation of an in-memory hash table. The hash table should support in place updates of elements (if the updated value has the same size as the new value) but also appending updates with invalidation of the old value (if the binary length of the new value differs). The hash table needs to be able to evict and emit all elements if it runs out-of-memory.

      We should also add HASH and SORT compiler hints to DataSet.reduce() and Grouping.reduce() to allow users to pick the execution strategy.

      Attachments

        Issue Links

          Activity

            People

              ggevay Gábor Gévay
              fhueske Fabian Hueske
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: