Details
Description
I consider to use Kafka in my company, so currently doing failover test.
Conditions:
- org.apache.kafka:kafka-clients:1.0.0
- New consumer using bootstrap.servers, a consumer group and a group coordinator
- num. brokers = 3 (id #1, #2, #3)
- Topic num. partitions = 3, replication factor = 3
- offsets.topic.replication.factor = 3
Reproduction Step:
- Run consumers in the same consumer group, each of them subscribe to a topic
- Kill (kill -9) #1, #2 broker simultaneously (only #3 online)
- Consumers eventually connect to #3 broker
- Start #1, #2 broker again after a while (#1, #2, #3 online)
- Kill (kill -9) #2, #3 broker simultaneously (only #1 online)
- Now consumers endlessly try to connect to #3 broker only
- Start #2 broker again after a while (#1, #2 online)
- Consumers still blindly try to connect to #3 broker
Expectation:
Consumers successfully connect to #1 broker after step 5.
Record:
I attached a consumer log file with TRACE log level. Related events below:
- 12:03:13 kills #1, #2 broker simultaneously
- 12:03:42 starts #1, #2 broker again
- 12:04:01 kills #2, #3 broker simultaneously
- 12:04:42 starts #2 broker again