YARN-11005, we will be able to schedule OContainers on nodes based on resource availability. That said, we should be able to allow nodes with 0 queue capacity to run OContainers (as these containers should be started directly immediately if resources are available, even if they are put on a "queue" first).
However, with the current implementation, if we set the queue length of NMs to be 0, at the RM, it assumes infinite queue capacity while at the NM, it disables the running of any OContainers, killing OContainers that arrive directly.
This issue works to address the above issues with the QUEUE_LENGTH_THEN_RESOURCES allocator.
This issue does not aim to change the existing behavior of the QUEUE_LENGTH allocator.
To add a new NodeManager config, opportunistic-containers-queue-policy, which allows the specification of the queueing policy at the NM.
Will start with BY_RESOURCES and BY_QUEUE_LEN, where if BY_RESOURCES is specified, the NM will queue as long as it has enough resources to run all pending + running containers. Otherwise, it will reject the OPPORTUNISTIC container.
On the other hand, if BY_QUEUE_LEN is specified, the NM will only accept as many containers as its queue capacity is configured.
Thus, if BY_QUEUE_LEN is specified and the NM's queue capacity is configured to be 0, the NM will reject all incoming OPPORTUNISTIC containers (today's behavior).
Note that this configuration does not affect how the RM behaves.
At the RM, if the queue capacity reported by the node is = 0 and the allocation policy is set to QUEUE_LENGTH_THEN_RESOURCES, it assumes that the node can still run OPPORTUNISTIC containers if it has available resources, otherwise it skips the node.
Subsequently, if the queue capacity reported by the node is = 0 and the allocation policy is set to QUEUE_LENGTH, it still assumes that the node can run infinitely many OPPORTUNISTIC containers, and it will be on the NM to reject these containers (today's behavior).