Description
The issue we saw is following:
1. Producer send message 0 to topic-partition-0 on broker A. The in-flight request to broker A is 1.
2. The request is somehow lost
3. Producer refreshed its topic metadata and found leader of topic-partition-0 migrated from broker A to broker B.
4. Because there is no in-flight request to broker B. All the subsequent messages to topic-partition-0 in the record accumulator are sent to broker B.
5. Later on when the request in step (1) times out, message 0 will be retried and sent to broker B. At this point, all the later messages has already been sent, so we have re-order.
Attachments
Issue Links
- blocks
-
KAFKA-3223 Add System (ducktape) Test that asserts strict partition ordering despite node failure
- Open