Details
Description
I have setup a Mesos cluster with a single Mesos Master and I submit spark jobs to it in "cluster" mode.
After running few spark jobs correctly, the Mesos master crashes while trying to shutdown one of the Spark frameworks with the following error -
F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: totalOfferedResources.filter(allocatedToRole).empty() *** Check failure stack trace: *** @ 0x7f1e024ded2e google::LogMessage::Fail() @ 0x7f1e024dec8d google::LogMessage::SendToLog() @ 0x7f1e024de637 google::LogMessage::Flush() @ 0x7f1e024e191c google::LogMessageFatal::~LogMessageFatal() @ 0x7f1dff93978d mesos::internal::master::Framework::untrackUnderRole() @ 0x7f1dffad004b mesos::internal::master::Master::removeFramework() @ 0x7f1dfface859 mesos::internal::master::Master::teardown() @ 0x7f1dffa8ba25 mesos::internal::master::Master::receive() @ 0x7f1dffb2f1cf ProtobufProcess<>::handlerMutM<>() @ 0x7f1dffbe6809 std::__invoke_impl<>() @ 0x7f1dffbdae22 std::__invoke<>() @ 0x7f1dffbc8079 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEES4_SD_St12_PlaceholderILi1EESO_ILi2EEEE6__callIvJS8_SL_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7f1dffbaaae5 std::_Bind<>::operator()<>() @ 0x7f1dffb833c9 std::_Function_handler<>::_M_invoke() @ 0x7f1dff330281 std::function<>::operator()() @ 0x7f1dffb13329 ProtobufProcess<>::consume() @ 0x7f1dffa85436 mesos::internal::master::Master::_consume() @ 0x7f1dffa84ad5 mesos::internal::master::Master::consume() @ 0x7f1dffafb9ae _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE @ 0x564c359f7002 process::ProcessBase::serve() @ 0x7f1e023a7bbd process::ProcessManager::resume() @ 0x7f1e023a407c _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv @ 0x7f1e023cf1ba _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ @ 0x7f1e023cd9c9 _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_ @ 0x7f1e023cc482 _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE @ 0x7f1e023cb53b _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv @ 0x7f1e023ca3c4 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEEE6_M_runEv @ 0x7f1e051f419d execute_native_thread_routine @ 0x7f1df4200ea5 start_thread @ 0x7f1df3f2996d __clone
It seems like an assertion check is failing which is categorized as fatal but I am not able to figure out the root cause of this.