Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.6.1
-
None
-
None
Description
After upgrading kafka.version from 3.3.2 to 3.6.1 we observed following issue: within our service we have a thread pool with 32 threads and eventually all these threads got blocked on Kafka Steams code. Stack trace:
org.apache.kafka.streams.state.internals.RocksDBStore.get line: 397 org.apache.kafka.streams.state.internals.RocksDBStore.get line: 84 org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$get$5 line: 319 org.apache.kafka.streams.state.internals.MeteredKeyValueStore$$Lambda$1454/0x0000000100970440.get line: not available org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency line: 887 org.apache.kafka.streams.state.internals.MeteredKeyValueStore.get line: 319 org.apache.kafka.streams.state.internals.ReadOnlyKeyValueStoreFacade.get line: 35 org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get line: 56
The issue doesn't happen immediately i.e. the service can work fine for hours/days but after some time after one thread gets blocked eventually all threads from the thread pool get blocked on the same line of Kafka Streams code.
Please see attached for a view when all thread pool threads got blocked: it's taken using Azul Mission Control and all threads are shown when we selected "Deadlock detection" checkbox: this seems to suggest there's a deadlock within Kafka Streams code.
After rolling back version to 3.3.2 the issue went away.
I'm wondering whether this is a known issue and if yes then whether it was fixed in any version after 3.6.1