We have recently seen cases where brokers end up in a bad state where fetch session evictions occur at a high rate (> 16 per second) after a roll. This increase in eviction rate included the following pattern in our logs:
This pattern appears to be problematic for two reasons. Firstly, the replica fetcher for broker 4 was clearly able to send multiple incremental fetch requests to broker 6, and receive replies, and did so right up to the point where broker 6 evicted its fetch session within milliseconds of multiple fetch requests. The second problem is that replica fetchers are considered privileged for the fetch session cache, and should not be evicted by consumer fetch sessions. This cluster only has 12 brokers and 1000 fetch session cache slots (the default for max.incremental.fetch.session.cache.slots), and it thus very unlikely that this session should have been evicted by another replica fetcher session.
This cluster also appears to be causing cycles of fetch session evictions where the cluster never stabilizes into a state where fetch sessions are not evicted. The above logs are the best example I could find of a case where a session clearly should not have been evicted.