In some of the testing Stack and I have been doing, we've uncovered some issues with concurrent RS failure and when the Master is under heavy load. It's led to situations where we handle ZK events far after they actually occur and have uncovered some issues in our timeout logic.
This jira is about reviewing the timeout semantics, especially around ZK usage, and ensuring that we handle things appropriately.
|Field||Original Value||New Value|
|Assignee||Jonathan Gray [ streamy ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|