Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8416

CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.

    XMLWordPrintableJSON

Details

    Description

      I0108 23:05:25.313344 31743 slave.cpp:620] Agent attributes: [  ]
      I0108 23:05:25.313832 31743 slave.cpp:629] Agent hostname: vagrant-ubuntu-wily-64
      I0108 23:05:25.314916 31763 task_status_update_manager.cpp:181] Pausing sending task status updates
      I0108 23:05:25.323496 31766 state.cpp:66] Recovering state from '/var/lib/mesos/slave/meta'
      I0108 23:05:25.323639 31766 state.cpp:724] No committed checkpointed resources found at '/var/lib/mesos/slave/meta/resources/resources.info'
      I0108 23:05:25.326169 31760 task_status_update_manager.cpp:207] Recovering task status update manager
      I0108 23:05:25.326954 31759 containerizer.cpp:674] Recovering containerizer
      F0108 23:05:25.331529 31759 containerizer.cpp:919] CHECK_SOME(container->directory): is NONE 
      *** Check failure stack trace: ***
          @     0x7f769dbc98bd  google::LogMessage::Fail()
          @     0x7f769dbc8c8e  google::LogMessage::SendToLog()
          @     0x7f769dbc958d  google::LogMessage::Flush()
          @     0x7f769dbcca08  google::LogMessageFatal::~LogMessageFatal()
          @     0x556cb4c2b937  _CheckFatal::~_CheckFatal()
          @     0x7f769c5ac653  mesos::internal::slave::MesosContainerizerProcess::recover()
      

      If the framework does not enable the checkpointing. It means there is no slave state checkpointed. But containers are still checkpointed at the runtime dir, which mean recovering a nested container would cause the CHECK failure due to its parent's sandbox dir is unknown.

      Attachments

        Issue Links

          Activity

            People

              gilbert Gilbert Song
              gilbert Gilbert Song
              Qian Zhang Qian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: