Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3122

FNFE due to race condition between "async localizer" and "update blob" timer thread

    XMLWordPrintableJSON

Details

    Description

      There's race condition between "async localizer" and "update blob" timer thread.

      When worker is shutting down, reference count for blob will be 0 and supervisor will remove actual blob file. There's also "update blob" timer thread which tries to keep blobs updated for downloaded topologies. While updating topology it should read some of blob files already downloaded assuming these files should be downloaded before, and the assumption is broken because of async localizer.

      arunmahadevan suggested an approach to fix this: "updateBlobsForTopology" can just catch the FIleNotFoundException and skip updating the blobs in case it can't find the stormconf, and the approach looks simplest fix so I'll provide a patch based on suggestion.

      Btw, it doesn't apply to master branch, since in master branch all blobs are synced up separately (no need to read stormconf to enumerate topology related blobs), and update logic is already fault-tolerance (skip to next sync when it can't pull the blob).

      Attachments

        Issue Links

          Activity

            People

              kabhwan Jungtaek Lim
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m