Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Current implementation of Nimbus-HA requires each nimbus client to discover nimbus hosts by contacting zookeeper. In order to reduce the load on zookeeper we could expose a thrift API as described in the future improvement section of the Nimbus HA design doc.
We will add an extra field in ClusterSummary structure called nimbuses.
struct ClusterSummary {
1: required list<SupervisorSummary> supervisors;
2: required i32 nimbus_uptime_secs;
3: required list<TopologySummary> topologies;
4: required list<NimbusSummary> nimbuses;
}
struct NimbusSummary {
1: require string host;
2: require int port;
3: require int uptimeSecs;
4: require boolean isLeader;
5: require string version;
6: optional list<string> local_storm_ids; //need a better name but these are list of storm-ids for which this nimbus host has the code available locally.
}
We will create a nimbus.hosts configuration which will serve as the seed list of nimbus hosts. Any nimbus host can serve the read requests so any client can issue getClusterSummary call and they can extract the leader nimbus summary from the list of nimbuses. All nimbus hosts will cache this information to reduce the load on zookeeper.
In addition we can add a RedirectException. When a request that can only be served by leader nimbus (i.e. submit, kill, rebalance, deactivate, activate) is issued against a non leader nimbus, the non leader nimbus will throw a RedirectException and the client will handle the exception by refreshing their leader nimbus host and contacting that host as part of retry.
Attachments
Issue Links
- duplicates
-
STORM-534 Store Nimbus Server Information in zookeeper path {storm.zookeeper.root}/nimbus
- Closed