Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0, 2.1.0
-
None
-
None
Description
OutputCollector is not thread-safe in 2.x.
It can cause data corruption if multiple threads in the same executor calls OutputCollector to emit data at the same time:
1. Every executor has an instance of ExecutorTransfer
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146
2. Every ExecutorTransfer has its own serializer
3. Every executor has its own outputCollector
4. When outputCollector is called to emit to remote workers, it uses ExecutorTransfer to transfer data
5. which will try to serialize data
6. But serializer is not thread-safe
But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says outputCollector is thread-safe.
Its perfectly fine to launch new threads in bolts that do processing asynchronously. OutputCollector is thread-safe and can be called at any time.
We should either fix it to make it thread-safe, or update the document to not mislead users
Attachments
Issue Links
- is related to
-
STORM-3646 Flush only happens on executor main thread
- Closed