Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10231

Mesos master crashes during framework teardown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Abandoned
    • 1.9.0
    • None
    • framework, master
    • None
    • CentOS Linux release 7.9.2009

      Mesos version - 1.9.0

    Description

      I have setup a Mesos cluster with a single Mesos Master and I submit spark jobs to it in "cluster" mode.

      After running few spark jobs correctly, the Mesos master crashes while trying to shutdown one of the Spark frameworks with the following error -

       

      F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: totalOfferedResources.filter(allocatedToRole).empty() 
      *** Check failure stack trace: ***
          @     0x7f1e024ded2e  google::LogMessage::Fail()
          @     0x7f1e024dec8d  google::LogMessage::SendToLog()
          @     0x7f1e024de637  google::LogMessage::Flush()
          @     0x7f1e024e191c  google::LogMessageFatal::~LogMessageFatal()
          @     0x7f1dff93978d  mesos::internal::master::Framework::untrackUnderRole()
          @     0x7f1dffad004b  mesos::internal::master::Master::removeFramework()
          @     0x7f1dfface859  mesos::internal::master::Master::teardown()
          @     0x7f1dffa8ba25  mesos::internal::master::Master::receive()
          @     0x7f1dffb2f1cf  ProtobufProcess<>::handlerMutM<>()
          @     0x7f1dffbe6809  std::__invoke_impl<>()
          @     0x7f1dffbdae22  std::__invoke<>()
          @     0x7f1dffbc8079  _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEES4_SD_St12_PlaceholderILi1EESO_ILi2EEEE6__callIvJS8_SL_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
          @     0x7f1dffbaaae5  std::_Bind<>::operator()<>()
          @     0x7f1dffb833c9  std::_Function_handler<>::_M_invoke()
          @     0x7f1dff330281  std::function<>::operator()()
          @     0x7f1dffb13329  ProtobufProcess<>::consume()
          @     0x7f1dffa85436  mesos::internal::master::Master::_consume()
          @     0x7f1dffa84ad5  mesos::internal::master::Master::consume()
          @     0x7f1dffafb9ae  _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE
          @     0x564c359f7002  process::ProcessBase::serve()
          @     0x7f1e023a7bbd  process::ProcessManager::resume()
          @     0x7f1e023a407c  _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
          @     0x7f1e023cf1ba  _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
          @     0x7f1e023cd9c9  _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_
          @     0x7f1e023cc482  _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
          @     0x7f1e023cb53b  _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv
          @     0x7f1e023ca3c4  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEEE6_M_runEv
          @     0x7f1e051f419d  execute_native_thread_routine
          @     0x7f1df4200ea5  start_thread
          @     0x7f1df3f2996d  __clone
      
      

       

       

      It seems like an assertion check is failing which is categorized as fatal but I am not able to figure out the root cause of this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            arch_dj Divyansh Jamuaar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: