Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
A failure in the below test may imply to a genuine missing message.
kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=True.security_protocol=PLAINTEXT
The test - which reassigns partition whilst bouncing cluster members - reconciles messages ack'd with messages received in the consumer.
The interesting part is that we received two ack's for the same offset, with different messages:
{"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7447","time_ms":1488349980718,"offset":372,"key":null} {"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7487","time_ms":1488349981780,"offset":372,"key":null}When searching the log files, via kafka.tools.DumpLogSegments, only the later message is found.
The missing message lies midway through the test and appears to occur after a leader moves (after 7447 is sent there is a ~1s pause, then 7487 is sent, along with a backlog of messages for partitions 11, 16, 6).
The overall implication is a message appears to be acknowledged but later lost.
Looking at the test itself it seems valid. The producer is initialised with acks = -1. The callback checks for an exception in the onCompletion callback and uses this to track acknowledgement in the test.
https://jenkins.confluent.io/job/system-test-kafka/521/console
http://testing.confluent.io/confluent-kafka-system-test-results/?prefix=2017-03-01--001.1488363091--apache--trunk--c9872cb/ReassignPartitionsTest/test_reassign_partitions/bounce_brokers=True.security_protocol=PLAINTEXT/