When application sets a placement constraint without specifying a nodePartition, the default partition is always chosen as the constraint when allocating containers. This can be a problem. when an application is submitted to a queue which has doesn't have enough capacity available on the default partition.
This is a common scenario when node labels are configured for a particular queue. The below sample sleeper service cannot get even a single container allocated when it is submitted to a "labeled_queue", even though enough capacity is available on the label/partition configured for the queue. Only the AM container runs.
It runs fine if I specify the node_partition explicitly in the constraints like below.
The problem seems to be because only the default partition "" is considered when node_partition constraint is not specified as seen in below RM log.
However, I think it makes more sense to consider "*" or the default-node-label-expression of the queue if configured, when no node_partition is specified in the placement constraint. Since not specifying any node_partition should ideally mean we don't enforce placement constraints on any node_partition. However we are enforcing the default partition instead now.