Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6231

Scheduler driver metrics can hang Metrics() in tests

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • test

    Description

      • SchedulerProcess has a field, metrics, whose constructor registers two metrics, event_queue_messages and event_queue_dispatches.
      • These metrics are implemented by defer'ing a message to SchedulerProcess.
      • If MesosSchedulerDriver is started and then stopped (but not destructed), SchedulerProcess is terminated but not destroyed.

      Hence, if a scheduler driver is started and then stopped, fetching the metric will hang. This means a test case that fetches Metrics() after stopping a scheduler driver will hang.

      For example, the following patch will hang SlaveTest.MetricsSlaveLaunchErrors.

      diff --git a/src/tests/slave_tests.cpp b/src/tests/slave_tests.cpp
      index 3471314..f323bb9 100644
      --- a/src/tests/slave_tests.cpp
      +++ b/src/tests/slave_tests.cpp
      @@ -1408,12 +1408,12 @@ TEST_F(SlaveTest, MetricsSlaveLaunchErrors)
         AWAIT_READY(failureUpdate);
         ASSERT_EQ(TASK_FAILED, failureUpdate.get().state());
      
      +  driver.stop();
      +  driver.join();
      +
         // After failure injection, metrics should report a single failure.
         snapshot = Metrics();
         EXPECT_EQ(1, snapshot.values["slave/container_launch_errors"]);
      -
      -  driver.stop();
      -  driver.join();
       }
      
      
      

      Attachments

        1. consoleText.txt
          334 kB
          Chun-Hung Hsiao

        Issue Links

          Activity

            People

              Unassigned Unassigned
              neilc Neil Conway
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: