Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.0.2, 1.x
-
None
-
None
Description
Kafka spout hangs when the number of uncommitted messages exceeds the max allowed uncommitted messages and some intermediate tuples have failed in down stream bolt.
Steps of reproduction.
Create a simple topology with one kafka spout and a slow bolt.
In kafka spout set the maximum uncommitted messages to a small number like 100.
Bolt should process 10 tuples in second. And program it to fail on some random tuples. For eg: say tuple number 10 fails. Also assume there is only 1 Kafka partition the spout reads from.
Spout on first execution of nextTuple() gets 110 records and emits them. At this point number of uncommitted message would be 110.
First 9 tuples are acked by the bolt. 10th tuple is failed by the bolt. KafkaSpout puts it on retry queue.
Tuple number 11 to 110 are acked by bolt . But spout only commits till offset 9.link
Now, the number of uncommitted messages = 110 - 9 = 101 > 100 (max allowed uncommitted messages)
No new records are polled from kafka.link . The spout is stuck as the nothing is polled.
Solution is to explicitly go through retry queue explicitly and emit tuples that are ready on every nextTuple().
Attachments
Issue Links
- is related to
-
STORM-2546 Kafka spout can stall / get stuck due to edge case with failing tuples
- Resolved
-
STORM-2549 The fix for STORM-2343 is incomplete, and the spout can still get stuck on failed tuples
- Resolved