Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7821

Streams: default cache size can lose session windows in high-throughput deployment

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.2.1, 2.1.0
    • None
    • streams
    • None

    Description

      We have observed that with a default cache size, a Streams aggregator will sometimes fail to find existing, open session windows while handling records. The effect is that it starts a new session and overwrites the old one and events fail to aggregate together.

      Our topology is fairly simple: We consume from a Kafka topic, group by keys, aggregate, then produce to another topic. Our aggregator is configured to use a window session strategy with an inactivity gap of 10 minutes and a retention period of 10 minutes. The system is deployed in production and handles about 250k messages per thread per minute (4 threads per application). The cache size is left default (10 MB).

      We worked around the issue by enlarging the cache (cache.max.bytes.buffering configuration parameter from 10 MB to 100MB) and no longer observe the issue at all. While troubleshooting, we noticed that older sessions would be the ones lost, so it seems like the cache is an LRU cache and is evicting windows before their inactivity time is up.

      This was originally observed in 10.2.1. We completed an upgrade to 2.1.0 and still observed the issue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            MJarvie Matthew Jarvie
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: