Thanks Wangda Tan, for the feedback, but had few things to discuss.
1) All script provider related configurations/logic should be removed.
Will take care of it, earlier thought at-least white-list entities let it be there but not a problem will correct it.
PreviousNodeLabels will be reset every time if we do a fetch. (To avoid handle same node labels as much as possible)
Don't do check if new fetched node label is as same as previousNodeLabels. (Also, avoid handle same node label)
May be I dint get your approach completely here ? but based on earlier discussions in 2495 we had finalized that we will send the node labels only when there is a change because
- unnecessary Network traffic which is amplified in a large cluster which has lot of NM's
- Either RM needs to do the validation of whether the labels are modified or blindly update the NodeLabelsStore. In former case is not better because RM will be overloaded to do this operation and in later case unnecessary Store file updates, i feel the magnitude of updates which will be happening in the large cluster with heartbeat coming in every second is too high and all are unwanted updates.
Based on the above points i feel its better to check and update RM only when there is a change in the NM's node labels
Don't reset node label if fetched node label is incorrect. (This should be a part of error handling, we should treat it's a error to be avoided instead of force reset it)
IIUC this was the conclusion we had in
YARN-2495 in the following comment. which i feel is correct because assume a case where in label is based on java version(/lib version) and java version is upgraded in a given NM but the centralized labels new java version is not yet added then NM will fail to update the node labels but if we do not reset it, RM will maintain the NM's NodeLabel as of Prev java version and the client app if run on this node it might fail due to this.
A little cosmetic suggestion.
Hmm agree on this but approach i felt we can have a inner class like NMDistributedNodeLabelsHandler and that can maintain the state of previous labels and takes care of initializing and taking care of all methods and logging related to NodeLabels . I have already added some methods but we require to maintain some state of labels in the heartbeat flow so introducing a class like this and pushing all methods related to labels there would be much better and more readable, Thoughts ?
I suggest to keep provider within nodemanager