Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
When nimbus re-gains leadership, the leaderCallback will sync-up with zookeeper:
https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/nimbus/LeaderListenerCallback.java#L106 https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L212
When killing topology, both zookeeper and in-memory assignments map get cleaned up.
However, in the syncRemoteAssignments call, it will get the information from zookeeper into stormIds. The after some processing (including deserialization), it will then put it into local in-memory assignments backend. If the zookeeper deletion happens between these two steps, then there will be mismatch between remote zookeeper and local backends.
We found this issue since we observed a NPE when making assignments.
2020-11-04 19:56:17.703 o.a.s.d.n.Nimbus timer [ERROR] Error while processing event java.lang.RuntimeException: java.lang.NullPointerException at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$17(Nimbus.java:1419) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.StormTimer$1.run(StormTimer.java:110) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:226) [storm-client-2.3.0.y.jar:2.3.0.y] Caused by: java.lang.NullPointerException at org.apache.storm.daemon.nimbus.HeartbeatCache.getAliveExecutors(HeartbeatCache.java:199) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.aliveExecutors(Nimbus.java:2029) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.computeTopologyToAliveExecutors(Nimbus.java:2109) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.computeNewSchedulerAssignments(Nimbus.java:2272) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.lockingMkAssignments(Nimbus.java:2467) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.mkAssignments(Nimbus.java:2453) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.mkAssignments(Nimbus.java:2397) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$17(Nimbus.java:1415) ~[storm-server-2.3.0.y.jar:2.3.0.y] ... 2 more 2020-11-04 19:56:17.703 o.a.s.u.Utils timer [ERROR] Halting process: Error while processing event
The existingAssignment comes from in-memory backend while the topologyToExecutors comes from zookeeper which did not include a deleted topolgy id. https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2108 https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2111 https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/HeartbeatCache.java#L199
So NPE happens.