Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3122

FNFE due to race condition between "async localizer" and "update blob" timer thread

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      There's race condition between "async localizer" and "update blob" timer thread.

      When worker is shutting down, reference count for blob will be 0 and supervisor will remove actual blob file. There's also "update blob" timer thread which tries to keep blobs updated for downloaded topologies. While updating topology it should read some of blob files already downloaded assuming these files should be downloaded before, and the assumption is broken because of async localizer.

      Arun Mahadevan suggested an approach to fix this: "updateBlobsForTopology" can just catch the FIleNotFoundException and skip updating the blobs in case it can't find the stormconf, and the approach looks simplest fix so I'll provide a patch based on suggestion.

      Btw, it doesn't apply to master branch, since in master branch all blobs are synced up separately (no need to read stormconf to enumerate topology related blobs), and update logic is already fault-tolerance (skip to next sync when it can't pull the blob).

        Attachments

          Activity

            People

            • Assignee:
              kabhwan Jungtaek Lim
              Reporter:
              kabhwan Jungtaek Lim

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m

                  Issue deployment