[STORM-4077] Worker being reassigned when Nimbus leadership changes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.1
Fix Version/s: 2.7.0
Component/s: None
Labels:
None

Description

Hey guys, I'm using Storm v2.6.1 and every time I restart the nimbus leader (currently I have 3 for high availability) the workers get reassigned and this is a bad behaviour as every topology will have no workers running for a certain period(until new workers are assigned) due to a Nimbus leadership change.

Update:

Essentially, by using the modTime as the version, we have found that, while using the LocalFsBlobStoreFile, everytime the the Nimbus leader goes down the following occurs:

Nimbus (1) leader goes down and a new Nimbus (2) picks up the leadership.
If blobs in Nimbus (2) have a different modTime workers are restarted (even though they might be the same).
Nimbus (1) comes back up, syncs the blobs in the startup and updates the modTime, as it downloads the blobs again.
If Nimbus (2) leader goes down, all the workers will be restarted again as Nimbus (1) has new modTime again.
This can be repeated endless as the modTime will always be different in each Nimbus leader.

We suggest a new method that obtains the file version:

public abstract class BlobStoreFile {
    public abstract long getModTime() throws IOException;

    public long getVersion() throws IOException {
        return getModTime();
    }
}

And defaults to the current approach if not implemented and the version of the file would be something in the lines:

public long getVersion() throws IOException {
    byte[] bytes = DigestUtils.sha1(new FileInputStream(path));
    return Arrays.hashCode(bytes);
}

Soon, I'll open the PR and link it here.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Pedro Azevedo

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Aug/24 17:45

Updated:: 3 days ago 18:43

Resolved:: 3 days ago 18:41