We have seen an issue lately where, after some time, a container will stop receiving messages from some partitions.
After examining the container, it appears that this is triggered by an abdication for a BrokerProxy that was not previously instantiated during the KafkaSystemConsumers.start() method. Here's what we see:
Notice that this log line never appears from BrokerProxy:
Digging in a bit, KafkaSystemConsumer.refreshBrokers can create a new BrokerProxy that wasn't created in the KafkaSystemConsumer.start() method in cases where a partition was moved to a broker that it hasn't yet created a proxy for.
But it never starts the thread.