Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.0
-
None
-
Reviewed
Description
When application sets a placement constraint without specifying a nodePartition, the default partition is always chosen as the constraint when allocating containers. This can be a problem. when an application is submitted to a queue which has doesn't have enough capacity available on the default partition.
This is a common scenario when node labels are configured for a particular queue. The below sample sleeper service cannot get even a single container allocated when it is submitted to a "labeled_queue", even though enough capacity is available on the label/partition configured for the queue. Only the AM container runs.
{ "name": "sleeper-service", "version": "1.0.0", "queue": "labeled_queue", "components": [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90000", "resource": { "cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type": "ANTI_AFFINITY", "scope": "NODE", "target_tags": [ "sleeper" ] } ] } } ] }
It runs fine if I specify the node_partition explicitly in the constraints like below.
{ "name": "sleeper-service", "version": "1.0.0", "queue": "labeled_queue", "components": [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90000", "resource": { "cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type": "ANTI_AFFINITY", "scope": "NODE", "target_tags": [ "sleeper" ], "node_partitions": [ "label" ] } ] } } ] }
The problem seems to be because only the default partition "" is considered when node_partition constraint is not specified as seen in below RM log.
2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367)) - Successfully added SchedulingRequest to app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper]. nodePartition=
However, I think it makes more sense to consider "*" or the default-node-label-expression of the queue if configured, when no node_partition is specified in the placement constraint. Since not specifying any node_partition should ideally mean we don't enforce placement constraints on any node_partition. However we are enforcing the default partition instead now.