- In the case of STATE.STOPPED both stop() and reboot() need to be called.
- I don't see anywhere in the call hierarchy where NodeManager.stop() is ever called in the reboot case.
- Following the logic through, it looks like this is the call sequence from your patch:
- calls: NodeStatusUpdaterImpl.reboot()
- calls: AbstractService.stop()
- calls: AbstractService.changeState()
- calls: NodeManager.stateChanged()
- Which does not call NodeManager.stop() if NodeStatusUpdaterImpl.isRebooted is set.
stop() was calling from reboot method. Anyway now I have moved it to common for SHUTDOWN and REBOOT cases.
- In order for the new private static variable nodeManager to be fully effective, you should take out the local NodeManager declaration within main(). Otherwise, the new getNodeManager() method won't always return the NodeManager instance.
It was added only for reboot testability purpose. Now I have refactored the code.
- Please look into the new findbugs and core test failures.
These test failures are not related to this patch.
- NodeManager.main() / NodeManager.reboot():
- Rather than copy the same code from main() into the new reboot() method, I would create a common method (call it commonNMMain, for example). You will probably need a parameter flag to tell you whether or not this is called from main() or from reboot().
I have refactored the code accordingly.
- You should handle the case where nodeManagerShutdownHook is null. Even though it should not be null, if it is null, you should create a new instance of it.
There will not be a case where it will be null. This condition is added only for test purpose.
- Rather than calling getConfig() when calling nodeManager.init(), I would argue that it would be better to create a new instance of conf so that any updates to the NM configs will be picked up. I guess it could be argued the other way, too. That is, you could make the case that the grid owner doesn't want the NM configs to change just because the RM was restarted. However, I think it should be a new NM config because the RM will have a new config.
I don't feel it is a good idea to reload the configuration when the NM gets rebooted. If the user want to update any property then they should restart the NM manually.
- YarnRPC rpc; is declared but never used.
It was my mistake, I was doing some thing and forgot to remove. I have updated now.
- I don't see where it is testing the RM-restart / NM-reboot case.
This test case verifies that when NM gets reboot signal, current instance will be stopped and new instance will be started successfully.