Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.1, 1.1.0
-
None
Description
I use simple topology with one spout (9 workers) and one bolt (9 workers).
I have topology.backpressure.enable: false in storm.yaml.
Spouts send about 10 000 000 tuples in 10 minutes. Pending for spout is 80 000.
Bolts buffer theirs tuples for 60 seconds and flush to database and ack tuples in parallel (10 threads).
I read that OutputCollector can be used in many threads safely, so i use it.
I don't have any bottleneck in bolts(flushing to database) or spouts(kafka spout), but about 2% of tuples fail due to tuple processing timeout (fails are recordered in spout stats only).
I am sure that bolts ack all tuples. But some of acks don't come to spouts.
While multi-threaded acking i see many errors in worker logs like that:
2016-12-01 13:21:10.741 o.a.s.u.DisruptorQueue [ERROR] NULL found in disruptor-executor[3 3]-send-queue:853877
I tried to use synchronized wrapper around OutputCollector to fix the error. But it didn't help.
I found the workaround that helps me: i do all processing in bolt in multiple threads but call OutputCollector.ack methods in a one single separate thread.
I think Storm has an error in the multi-threaded use of OutputCollector.
If my topology has much less load, like 500 000 tuples per 10 minutes, then i don't lose any acks.
Attachments
Issue Links
- links to