I'm creating this issue in response to Guozhang Wang's request on the mailing list:
We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but experience a severe performance degradation. The highest amount of CPU time seems spent in retrieving from the local cache. Here's an example thread profile with 0.11.0.0:
When things are running smoothly we're gated by retrieving from the state store with acceptable performance. Here's an example thread profile with 0.10.2.1:
Some investigation reveals that it appears we're performing about 3 orders magnitude more lookups on the NamedCache over a comparable time period. I've attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
We're using session windows and have the app configured for commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
I'm happy to share more details if they would be helpful. Also happy to run tests on our data.
I also found this issue, which seems like it may be related: