Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7385

Framework should not starve due to `dovetailing` in naive H-DRF implementation.

    XMLWordPrintableJSON

    Details

      Description

      Mesos currently implements naive H-DRF algorithm, as described in h-drf paper, which may incur starvation due to `dovetailing`. Essentially, following test should pass:

      TEST_F(HierarchicalAllocatorTest, Starvation)
      {
        Clock::pause();
      
        initialize();
      
        const string ROLE1 = "a";
        const string ROLE2 = "b/c";
        const string ROLE3 = "b/d";
      
        FrameworkInfo framework1 = createFrameworkInfo({ROLE1});
        allocator->addFramework(framework1.id(), framework1, {}, true);
      
        SlaveInfo agent1 = createSlaveInfo("cpus:1");
        allocator->addSlave(
            agent1.id(),
            agent1,
            AGENT_CAPABILITIES(),
            None(),
            agent1.resources(),
            {});
      
        // `framework1` will be offered all of the resources on `agent1`.
        {
          Allocation expected = Allocation(
              framework1.id(),
              {{ROLE1, {{agent1.id(), agent1.resources()}}}});
      
          AWAIT_EXPECT_EQ(expected, allocations.get());
        }
      
        // Create `framework2` in the child role.
        FrameworkInfo framework2 = createFrameworkInfo({ROLE2});
        allocator->addFramework(framework2.id(), framework2, {}, true);
      
        SlaveInfo agent2 = createSlaveInfo("mem:32");
        allocator->addSlave(
            agent2.id(),
            agent2,
            AGENT_CAPABILITIES(),
            None(),
            agent2.resources(),
            {});
      
        {
          Allocation expected = Allocation(
              framework2.id(),
              {{ROLE2, {{agent2.id(), agent2.resources()}}}});
      
          AWAIT_EXPECT_EQ(expected, allocations.get());
        }
      
        // Create `framework3` in the child role.
        FrameworkInfo framework3 = createFrameworkInfo({ROLE3});
        allocator->addFramework(framework3.id(), framework3, {}, true);
      
        SlaveInfo agent3 = createSlaveInfo("cpus:1");
        allocator->addSlave(
            agent3.id(),
            agent3,
            AGENT_CAPABILITIES(),
            None(),
            agent3.resources(),
            {});
      
        // Current fair share is:
        // - `framework1`: 50% (1/2 cpus)
        // - `framework2`: 100% (32/32 mem)
        // - `framework3`: 0% (0/2 cpus)
        // So `framework3` should be offered all of the resources on `agent3`.
        // However, `framework3` is punished due to naive h-drf implementation,
        // where fair share of parent role `b` has fair share of 100%, which
        // leads to starvation.
        {
          Allocation expected = Allocation(
              framework3.id(),
              {{ROLE3, {{agent3.id(), agent3.resources()}}}});
      
          AWAIT_EXPECT_EQ(expected, allocations.get()); // It fails!
        }
      }
      

      This JIRA is created to make sure this behavior is captured and will be addressed in the future. Note that it affects current implementation without hierarchical role as well.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              guoger Jay Guo
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: