Thanks for raising this Bibin A Chundatt and for chiming in Varun Saxena.
If we fix code as above, we will return less nodes for scheduling opportunistic containers than yarn.opportunistic-container-allocation.nodes-used configuration even though enough nodes are available. But this should be updated the very next second (as per default config) which maybe fine.
As you pointed out, this is actually fine.
Although we remove node when a node is lost from cluster nodes, we do not remove it from sorted nodes. Because for doing it we will have to iterate over the list. Can we keep a set instead ?
We had initially thought of using a SortedSet, but Insertions and deletions were somewhat expensive and a LinkedList cheaply satisfied our use-case.
Can you maybe add a test to TestNodeQueueLoadMonitor for this ?