Description
Currently we have the following aggregation APIs in the Streams DSL:
KStream.aggregateByKey(..) KStream.reduceByKey(..) KStream.countByKey(..) KTable.groupBy(...).aggregate(..) KTable.groupBy(...).reduce(..) KTable.groupBy(...).count(..)
And it is better to add common aggregation functions like Sum and Avg as built-in into the Streams DSL. A few questions to ask though:
1. Should we add those built-in functions as, for example KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, ...). Please see the comments below for detailed pros and cons.
2. If we go with the second option above, should we replace the countByKey / count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel it is not necessary, as COUNT is a special aggregate function since we do not need to map on any value fields; this is the same approach as in Spark as well, where Count is built-in as first-citizen in the DSL, and others are built-in as aggregate(SUM), etc.
Attachments
Issue Links
- links to