Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Mesos currently implements naive H-DRF algorithm, as described in h-drf paper, which may incur starvation due to `dovetailing`. Essentially, following test should pass:
TEST_F(HierarchicalAllocatorTest, Starvation) { Clock::pause(); initialize(); const string ROLE1 = "a"; const string ROLE2 = "b/c"; const string ROLE3 = "b/d"; FrameworkInfo framework1 = createFrameworkInfo({ROLE1}); allocator->addFramework(framework1.id(), framework1, {}, true); SlaveInfo agent1 = createSlaveInfo("cpus:1"); allocator->addSlave( agent1.id(), agent1, AGENT_CAPABILITIES(), None(), agent1.resources(), {}); // `framework1` will be offered all of the resources on `agent1`. { Allocation expected = Allocation( framework1.id(), {{ROLE1, {{agent1.id(), agent1.resources()}}}}); AWAIT_EXPECT_EQ(expected, allocations.get()); } // Create `framework2` in the child role. FrameworkInfo framework2 = createFrameworkInfo({ROLE2}); allocator->addFramework(framework2.id(), framework2, {}, true); SlaveInfo agent2 = createSlaveInfo("mem:32"); allocator->addSlave( agent2.id(), agent2, AGENT_CAPABILITIES(), None(), agent2.resources(), {}); { Allocation expected = Allocation( framework2.id(), {{ROLE2, {{agent2.id(), agent2.resources()}}}}); AWAIT_EXPECT_EQ(expected, allocations.get()); } // Create `framework3` in the child role. FrameworkInfo framework3 = createFrameworkInfo({ROLE3}); allocator->addFramework(framework3.id(), framework3, {}, true); SlaveInfo agent3 = createSlaveInfo("cpus:1"); allocator->addSlave( agent3.id(), agent3, AGENT_CAPABILITIES(), None(), agent3.resources(), {}); // Current fair share is: // - `framework1`: 50% (1/2 cpus) // - `framework2`: 100% (32/32 mem) // - `framework3`: 0% (0/2 cpus) // So `framework3` should be offered all of the resources on `agent3`. // However, `framework3` is punished due to naive h-drf implementation, // where fair share of parent role `b` has fair share of 100%, which // leads to starvation. { Allocation expected = Allocation( framework3.id(), {{ROLE3, {{agent3.id(), agent3.resources()}}}}); AWAIT_EXPECT_EQ(expected, allocations.get()); // It fails! } }
This JIRA is created to make sure this behavior is captured and will be addressed in the future. Note that it affects current implementation without hierarchical role as well.