Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-676

Slave::reregistered LOG(FATAL)s due to being in RECOVERING state.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • 0.14.0
    • None
    • None

    Description

      void Slave::reregistered(const SlaveID& slaveId)
      {
      switch(state) {
      case DISCONNECTED:
      LOG(INFO) << "Re-registered with master " << master;

      state = RUNNING;
      if (!(info.id() == slaveId))

      { EXIT(1) << "Re-registered but got wrong id: " << slaveId << "(expected: " << info.id() << "). Committing suicide"; }

      break;
      case RUNNING:
      // Already re-registered!
      if (!(info.id() == slaveId))

      { EXIT(1) << "Re-registered but got wrong id: " << slaveId << "(expected: " << info.id() << "). Committing suicide"; }

      LOG(WARNING) << "Already re-registered with master " << master;
      break;
      case TERMINATING:
      LOG(WARNING) << "Ignoring re-registration because slave is terminating";
      break;
      case RECOVERING:
      default:
      LOG(FATAL) << "Unexpected slave state " << state;
      break;
      }
      }

      Saw a slave fail because of this last case statement:

      F0903 02:01:26.436521 42417 slave.cpp:672] Unexpected slave state 0

          • Check failure stack trace: ***
            @ 0x7f042c579d8d google::LogMessage::Fail()
            @ 0x7f042c57dd77 google::LogMessage::SendToLog()
            @ 0x7f042c57c674 google::LogMessage::Flush()
            @ 0x7f042c57c8a6 google::LogMessageFatal::~LogMessageFatal()
            @ 0x7f042c21db8a mesos::internal::slave::Slave::reregistered()
            @ 0x7f042c276c1d ProtobufProcess<>::handler1<>()
            @ 0x7f042c24560a std::tr1::_Function_handler<>::_M_invoke()
            @ 0x7f042c27702b ProtobufProcess<>::visit()
            @ 0x7f042c46baf4 process::ProcessManager::resume()
            @ 0x7f042c46c54f process::schedule()
            @ 0x7f042bbd983d start_thread
            @ 0x7f042a5bbf8d clone
            /usr/local/bin/mesos-slave.sh: line 117: 42408 Aborted (core dumped) /usr/local/sbin/mesos-slave --port=5051 --resources="${MESOS_RESOURCES}" --attributes="${MESOS_ATTRIBUTES}" --master="${master_zoo_url}" --log_dir="${log_dir}" ${EXTRA_FLAGS} "$@"
            Slave Exit Status: 134

      Attachments

        Activity

          People

            bmahler Benjamin Mahler
            bmahler Benjamin Mahler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: