The DataStream API currently has two aggregation functions that can be used on windows and in state, both of which have limitations:
- ReduceFunction only supports one type as the type that is added and aggregated/returned.
- FoldFunction Supports different types to add and return, but is not distributive, i.e. it cannot be used for hierarchical aggregation, for example to split the aggregation into to pre- and final-aggregation.
I suggest to add a generic and powerful aggregation function that supports:
- Different types to add, accumulate, and return
- The ability to merge partial aggregated by merging the accumulated type.
The proposed interface is below. This type of interface is found in many APIs, like that of various databases, and also in Apache Beam:
- The accumulator is the state of the running aggregate
- Accumulators can be merged
- Values are added to the accumulator
- Getting the result from the accumulator perform an optional finalizing operation