Hi Wangda Tan,
Really thanks for your comments.
I took a quick look at the patch, some problems I can see now:
- It involves some unnecessary interface/parameter to NodeLabelsProvider, this also leads to unnecessary changes to NM
This patch tries to move NodeLabelsProvider from hadoop-yarn-server-nodemanager to hadoop-yarn-server-common to make it usable by both NM and RM. But it's fine to keep it untouched.
- Fetcher implementation is polling updated labels for ALL NMs in the cluster, if a cluster has several thousands of NMs, this can be inefficient.
Good advice. We can solve this issue by updating the labels for ALL NMs in one request, not one by one. Will update the patch accordingly.
My biggest concern is still about if this change is must-to-have:
Since we already have a set of APIs to do this, I can't see a big add-on value of doing this inside RM.
I understand your concern and agree that with a cron job, some scripts and REST API, we do be able to achieve the functionality. While this improvement will have its value. It can largely decrease the amount of additional work to do and other difficulties for integrating a label source. Also it increases the usability of the label feature from management perspective. We know, a lot of times, how a technology will be adapted by users depends largely on how easily the technology can be used or integrated. Although this is not a "must-to-have", this improvement take the label feature a step further from the integration point of view.
For large clusters, it's usually not practical to manage the label of all nodes manually. Enterprises usually use some kind of label or label policy storage. This improvement can help address this requirement perfectly with the minimized additional development work. Also, this feature can be used as a different use case than synchronizing the labels through REST API because the configuration of a label provider mechanism at the YARN side means the management operations (usually done by administrator) instead of REST API operation of a client, adding the trustfulness of label source.
Further more, we will target to make this change to be simple, light weight and strait-forward . It will not bring any additional complexity to YARN architecture but provide a flexible functionality for label integration.
Thank you again for your feedback.