Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3122

FNFE due to race condition between "async localizer" and "update blob" timer thread

    XMLWordPrintableJSON

    Details

      Description

      There's race condition between "async localizer" and "update blob" timer thread.

      When worker is shutting down, reference count for blob will be 0 and supervisor will remove actual blob file. There's also "update blob" timer thread which tries to keep blobs updated for downloaded topologies. While updating topology it should read some of blob files already downloaded assuming these files should be downloaded before, and the assumption is broken because of async localizer.

      Arun Mahadevan suggested an approach to fix this: "updateBlobsForTopology" can just catch the FIleNotFoundException and skip updating the blobs in case it can't find the stormconf, and the approach looks simplest fix so I'll provide a patch based on suggestion.

      Btw, it doesn't apply to master branch, since in master branch all blobs are synced up separately (no need to read stormconf to enumerate topology related blobs), and update logic is already fault-tolerance (skip to next sync when it can't pull the blob).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kabhwan Jungtaek Lim
                Reporter:
                kabhwan Jungtaek Lim
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m