Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.3.2, 1.4.1, 1.5.0, 1.6.0
Description
The allocator can crash when quota is removed due to a race between the removal and the quota_allocated metric computation. If the metric dispatches before the quota removal can remove the metric, then this crash occurs:
May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: F0510 20:15:28.821099 7205 sorter.cpp:395] Check failed: 'find(clientPath)' Must be non NULL May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: *** Check failure stack trace: *** May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d843bdcd google::LogMessage::Fail() May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d843dbfd google::LogMessage::SendToLog() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d843b9bc google::LogMessage::Flush() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d843e4f9 google::LogMessageFatal::~LogMessageFatal() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d791f79d google::CheckNotNull<>() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d791a3c4 mesos::internal::master::allocator::DRFSorter::allocationScalarQuantities() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d7900bc9 mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::_quota_allocated() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d79182f9 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEESt5_BindIFZNS0_8dispatchIdN5mesos8internal6master9allocator8internal28HierarchicalAllocatorProcessERKSsSD_SD_SD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSI_FSF_T1_T2_EOT3_OT4_EUlRSsSU_S2_E_SsSsSt12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS2_ May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d83a6eac process::ProcessManager::resume() May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d83ac826 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d63d12b0 (unknown) May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d5befe25 start_thread May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @ 0x7fd7d591d34d __clone