Discussed offline with Jian He, we think a couple of things need to get fixed here :
1. Fix the asymmetric behaviors in register()/unregisterSource() at MetricsSystemImpl that source name is still left in sourceNames.map in DefaultMetricsSystem after unregisterSource().
2. ContainerMetrics.finished() could get called twice - one for container life cycle (involved in
YARN-4906) and the other in container monitoring life cycle. Ideally, it is better to make sure ContainerMetrics.finished() for the same container only get called one time one place. However, in practice, the container event life cycle and container monitor event life cycle are independent and cannot replace each other. Alternatively, we will make sure scheduleTimerTaskForUnregistration() only get called one time or it will be more threads of unregistration than needed.
3. In case one ContainerMetrics already get finished before (triggered as ContainerDoneTransition by ContainerKillEvent, ContianerDoneEvent, etc.), current logic in ContainerMonitorImpl.updateContainerMetrics(ContainersMonitorEvent) will still register metrics into DefaultMetricsSystem first (via ContainerMetrics.forContainer(...)) and unregister it from DefaultMetricsSystem soon after. This is completely unnecessary.
Will deliver a fix for three issues raised above.