Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.6.1
-
None
-
None
Description
Hey guys, I'm using Storm v2.6.1 and every time I restart the nimbus leader (currently I have 3 for high availability) the workers get reassigned and this is a bad behaviour as every topology will have no workers running for a certain period(until new workers are assigned) due to a Nimbus leadership change.
Update:
Essentially, by using the modTime as the version, we have found that, while using theĀ LocalFsBlobStoreFile, everytime the the Nimbus leader goes down the following occurs:
- Nimbus (1) leader goes down and a new Nimbus (2) picks up the leadership.
- If blobs in Nimbus (2) have a different modTime workers are restarted (even though they might be the same).
- Nimbus (1) comes back up, syncs the blobs in the startup and updates the modTime, as it downloads the blobs again.
- If Nimbus (2) leader goes down, all the workers will be restarted again as Nimbus (1) has new modTime again.
- This can be repeated endless as the modTime will always be different in each Nimbus leader.
We suggest a new method that obtains the file version:
public abstract class BlobStoreFile { public abstract long getModTime() throws IOException; public long getVersion() throws IOException { return getModTime(); } }
And defaults to the current approach if not implemented and the version of the file would be something in the lines:
public long getVersion() throws IOException { byte[] bytes = DigestUtils.sha1(new FileInputStream(path)); return Arrays.hashCode(bytes); }
Soon, I'll open the PR and link it here.