Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
- SchedulerProcess has a field, metrics, whose constructor registers two metrics, event_queue_messages and event_queue_dispatches.
- These metrics are implemented by defer'ing a message to SchedulerProcess.
- If MesosSchedulerDriver is started and then stopped (but not destructed), SchedulerProcess is terminated but not destroyed.
Hence, if a scheduler driver is started and then stopped, fetching the metric will hang. This means a test case that fetches Metrics() after stopping a scheduler driver will hang.
For example, the following patch will hang SlaveTest.MetricsSlaveLaunchErrors.
diff --git a/src/tests/slave_tests.cpp b/src/tests/slave_tests.cpp index 3471314..f323bb9 100644 --- a/src/tests/slave_tests.cpp +++ b/src/tests/slave_tests.cpp @@ -1408,12 +1408,12 @@ TEST_F(SlaveTest, MetricsSlaveLaunchErrors) AWAIT_READY(failureUpdate); ASSERT_EQ(TASK_FAILED, failureUpdate.get().state()); + driver.stop(); + driver.join(); + // After failure injection, metrics should report a single failure. snapshot = Metrics(); EXPECT_EQ(1, snapshot.values["slave/container_launch_errors"]); - - driver.stop(); - driver.join(); }
Attachments
Attachments
Issue Links
- causes
-
MESOS-8976 MasterTest.LaunchDuplicateOfferLost is flaky
- Open
- is related to
-
MESOS-6228 Add timeout to /metrics/snapshot calls in tests
- Open
- relates to
-
MESOS-6308 CHECK failure in DRF sorter.
- Resolved