Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8963

Executor crash trying to print container ID.

    XMLWordPrintableJSON

Details

    Description

      As observed in an internal cluster:

      mesos-default-executor: /pkg/src/mesos/3rdparty/stout/include/stout/option.hpp:112: T& Option<T>::get() & [with T = mesos::ContainerID]: Assertion `isSome()' failed.
      *** Aborted at 1527514147 (unix time) try "date -d @1527514147" if you are using GNU date ***
      PC: @     0x7f9fe3b5c1f7 (unknown)
      *** SIGABRT (@0x6300000005) received by PID 5 (TID 0x7f9fdfe8e700) from PID 5; stack trace: ***
          @     0x7f9fe3ef95e0 (unknown)
          @     0x7f9fe3b5c1f7 (unknown)
          @     0x7f9fe3b5d8e8 (unknown)
          @     0x7f9fe3b55266 (unknown)
          @     0x7f9fe3b55312 (unknown)
          @     0x7f9fe581b9b0 _ZNR6OptionIN5mesos11ContainerIDEE3getEv.part.134
          @     0x7f9fe58a19f5 _ZZN5mesos8internal6checks14CheckerProcess18nestedCommandCheckEvENKUlRKN7process4http8ResponseEE0_clES7_
          @     0x7f9fe66a8edc process::ProcessManager::resume()
          @     0x7f9fe66ae856 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
          @     0x7f9fe46d32b0 (unknown)
          @     0x7f9fe3ef1e25 (unknown)
          @     0x7f9fe3c1f34d (unknown)
      

      The issue is caused by not this block in CheckerProcess not checking that previousCheckContainerId is still some after it had yielded control:

      // checker_process.cpp:649
      LOG(WARNING) << "Connection to remove the nested container '"
                                    << previousCheckContainerId.get() << "' used for the "
                                    << name << " for task '" << taskId << "' failed: "
                                    << failure;
      

      Attachments

        Activity

          People

            bennoe Benno Evers
            bennoe Benno Evers
            Alex R Alex R
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: