[KAFKA-7820] distinct count kafka streams api - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
- needs-kip

Description

we are using Kafka streams for our real-time analytic use cases. most of our use cases involved with doing distinct count on certain fields.

currently we do distinct count by storing the hash map value of the data in a set and do a count as event flows in. There are lot of challenges doing this using application memory, because storing the hashmap value and counting them is limited by the allotted memory size. When we get high volume or spike in traffic hash map of the distinct count fields grows beyond allotted memory size leading to issues.

other issue is when we scale the app, we need to use global ktables so we get all the values for doing distinct count and this adds back pressure in the cluster or we have to re-partition the topic and do count on the key.

Can we have feature, where the distinct count is supported by through streams api at the framework level, rather than dealing it with application level.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Vinoth Rajasekar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Jan/19 19:56

Updated:: 17/Jan/19 03:50