Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8904

Master crash when removing quota.

    XMLWordPrintableJSON

Details

    Description

      The allocator can crash when quota is removed due to a race between the removal and the quota_allocated metric computation. If the metric dispatches before the quota removal can remove the metric, then this crash occurs:

      May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: F0510 20:15:28.821099  7205 sorter.cpp:395] Check failed: 'find(clientPath)' Must be non NULL
      May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: *** Check failure stack trace: ***
      May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d843bdcd  google::LogMessage::Fail()
      May 10 20:15:28 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d843dbfd  google::LogMessage::SendToLog()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d843b9bc  google::LogMessage::Flush()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d843e4f9  google::LogMessageFatal::~LogMessageFatal()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d791f79d  google::CheckNotNull<>()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d791a3c4  mesos::internal::master::allocator::DRFSorter::allocationScalarQuantities()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d7900bc9  mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::_quota_allocated()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d79182f9  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEESt5_BindIFZNS0_8dispatchIdN5mesos8internal6master9allocator8internal28HierarchicalAllocatorProcessERKSsSD_SD_SD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSI_FSF_T1_T2_EOT3_OT4_EUlRSsSU_S2_E_SsSsSt12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS2_
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d83a6eac  process::ProcessManager::resume()
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d83ac826  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d63d12b0  (unknown)
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d5befe25  start_thread
      May 10 20:15:29 int-master1.sanitized.mesosphe.re mesos-master[7189]: @     0x7fd7d591d34d  __clone
      

      Attachments

        Activity

          People

            greggomann Greg Mann
            bmahler Benjamin Mahler
            Benjamin Mahler Benjamin Mahler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: