Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3620

Data corruption can happen when components are multi-threaded because of non thread-safe serializer

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.1.0
    • Fix Version/s: 2.2.0
    • Component/s: None
    • Labels:
      None

      Description

      OutputCollector is not thread-safe in 2.x.

      It can cause data corruption if multiple threads in the same executor calls OutputCollector to emit data at the same time:

      1. Every executor has an instance of ExecutorTransfer
      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146

      2. Every ExecutorTransfer has its own serializer

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44

      3. Every executor has its own outputCollector

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147

      4. When outputCollector is called to emit to remote workers, it uses ExecutorTransfer to transfer data

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66

      5. which will try to serialize data

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116

      6. But serializer is not thread-safe

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43


      But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says outputCollector is thread-safe.

      Its perfectly fine to launch new threads in bolts that do processing asynchronously. OutputCollector is thread-safe and can be called at any time.
      

      We should either fix it to make it thread-safe, or update the document to not mislead users

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ethanli Ethan Li
                Reporter:
                ethanli Ethan Li
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m