Brief steps to reproduce
- Enable async scheduling, 5 threads
- Submit a lot of jobs trying to exhaust cluster resource
- After a while, observed NM allocated resource is more than resource requested by allocated containers
Looks like the commit phase is not sync handling reserved containers, causing some proposal incorrectly accepted, subsequently resource was deducted multiple times for a container.