Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.x
Description
There's race condition between "async localizer" and "update blob" timer thread.
When worker is shutting down, reference count for blob will be 0 and supervisor will remove actual blob file. There's also "update blob" timer thread which tries to keep blobs updated for downloaded topologies. While updating topology it should read some of blob files already downloaded assuming these files should be downloaded before, and the assumption is broken because of async localizer.
arunmahadevan suggested an approach to fix this: "updateBlobsForTopology" can just catch the FIleNotFoundException and skip updating the blobs in case it can't find the stormconf, and the approach looks simplest fix so I'll provide a patch based on suggestion.
Btw, it doesn't apply to master branch, since in master branch all blobs are synced up separately (no need to read stormconf to enumerate topology related blobs), and update logic is already fault-tolerance (skip to next sync when it can't pull the blob).
Attachments
Issue Links
- links to