Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
The current test framework doesn't work well with the existing tests using the new consumer protocol. There are two main issues I've seen.
First, we sometimes assume there is no rebalance triggered, for instance in consumer_test.py::test_consumer_failure
verify that there were no rebalances on failover assert num_rebalances == consumer.num_rebalances(), "Broker failure should not cause a rebalance"
The current frame work calculates num_rebalances by increment by one every time a new assignment is received, so if a reconciliation happened during the failover, num_rebalances will also be incremented. For new protocol we need a new way to update num_rebalances.
Second, for the new protocol, we need a way to make sure all members have joined and stablized. Currently we only make sure all members have joined (the event handlers are all in Joined state), where some partitions haven't been assigned and more time is needed for reconciliation. The issue can cause failure in assertions like timeout waiting for consumption and
partition_owner = consumer.owner(partition)
assert partition_owner is not None
For a short term solution, we can make the tests pass by bypassing with adding time.sleep or skip checking num_rebalance. To truly fix them, we should adjust tools/src/main/java/org/apache/kafka/tools/VerifiableConsumer.java to work well with the new protocol.
Attachments
Issue Links
- is a child of
-
KAFKA-17183 New consumer system tests pass for subset of tests, but fail if running all tests
- Resolved
- is related to
-
KAFKA-16576 New consumer fails with assert in consumer_test.py’s test_consumer_failure system test
- Resolved