Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7119

Mesos master crash while accepting inverse offer.

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.1.2, 1.2.0
    • None

    Description

      We noticed a Mesos master invariant check failing leading to a crash while accepting an inverse offer. The HEAD is : c7fc1377b33c4eb83a01167bdb53c102c06b9a99 from Jan 11. https://github.com/apache/mesos/commit/c7fc1377b33c4eb83a01167bdb53c102c06b9a99

      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.564393 27362 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0002
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.564457 27362 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0003
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.564517 27362 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0003
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.566793 27367 master.cpp:6664] Sending 1 offers to framework 2d45d0b7-0d58-43e4-9662-d876a100a055-0009 (hello-
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567001 27367 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0001
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567091 27367 master.cpp:6754] Sending 1 inverse offers to framework 2d45d0b7-0d58-43e4-9662-d876a100a055-0009
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567168 27367 master.cpp:6754] Sending 1 inverse offers to framework 2d45d0b7-0d58-43e4-9662-d876a100a055-0018
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567234 27367 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0061
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567322 27367 master.cpp:6754] Sending 1 inverse offers to framework 2d45d0b7-0d58-43e4-9662-d876a100a055-0012
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567405 27367 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0003
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567876 27363 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0061
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.567975 27363 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0062
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.568056 27363 master.cpp:6754] Sending 1 inverse offers to framework 98b4f7a3-fc41-48c8-a37d-ed85ed371929-0003
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: I0211 17:00:41.584126 27369 http.cpp:410] HTTP POST for /master/api/v1/scheduler from 10.10.0.68:41428
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: W0211 17:00:41.584228 27369 master.cpp:4601] Ignoring accept of inverse offer 01021b50-55f0-420e-8744-1ba1eceb3f55-O135611 s
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: F0211 17:00:41.584259 27369 master.cpp:4605] CHECK_SOME(slaveId): is NONE
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: *** Check failure stack trace: ***
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0da91ad  google::LogMessage::Fail()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0daafdd  google::LogMessage::SendToLog()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0da8d9c  google::LogMessage::Flush()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0dab8d9  google::LogMessageFatal::~LogMessageFatal()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af005d4a9  _CheckFatal::~_CheckFatal()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0285235  mesos::internal::master::Master::acceptInverseOffers()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af01f5bc9  mesos::internal::master::Master::Http::scheduler()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af024aa77  _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestERK6OptionISsEEZN5mesos8
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0d16840  _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBas
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af014effa  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8Resp
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0d1dca1  process::ProcessManager::resume()
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9af0d26ba7  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9aef1b6230  (unknown)
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9aee9d4dc5  start_thread
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal mesos-master[27357]: @     0x7f9aee70373d  __clone
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: main process exited, code=killed, status=6/ABRT
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal systemd[1]: Unit dcos-mesos-master.service entered failed state.
      Feb 11 17:00:41 ip-10-10-0-215.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service failed.
      

      Attaching the logs for the master too.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            anandmazumdar Anand Mazumdar
            anandmazumdar Anand Mazumdar
            Joseph Wu Joseph Wu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment