Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8416

CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      I0108 23:05:25.313344 31743 slave.cpp:620] Agent attributes: [  ]
      I0108 23:05:25.313832 31743 slave.cpp:629] Agent hostname: vagrant-ubuntu-wily-64
      I0108 23:05:25.314916 31763 task_status_update_manager.cpp:181] Pausing sending task status updates
      I0108 23:05:25.323496 31766 state.cpp:66] Recovering state from '/var/lib/mesos/slave/meta'
      I0108 23:05:25.323639 31766 state.cpp:724] No committed checkpointed resources found at '/var/lib/mesos/slave/meta/resources/resources.info'
      I0108 23:05:25.326169 31760 task_status_update_manager.cpp:207] Recovering task status update manager
      I0108 23:05:25.326954 31759 containerizer.cpp:674] Recovering containerizer
      F0108 23:05:25.331529 31759 containerizer.cpp:919] CHECK_SOME(container->directory): is NONE 
      *** Check failure stack trace: ***
          @     0x7f769dbc98bd  google::LogMessage::Fail()
          @     0x7f769dbc8c8e  google::LogMessage::SendToLog()
          @     0x7f769dbc958d  google::LogMessage::Flush()
          @     0x7f769dbcca08  google::LogMessageFatal::~LogMessageFatal()
          @     0x556cb4c2b937  _CheckFatal::~_CheckFatal()
          @     0x7f769c5ac653  mesos::internal::slave::MesosContainerizerProcess::recover()
      

      If the framework does not enable the checkpointing. It means there is no slave state checkpointed. But containers are still checkpointed at the runtime dir, which mean recovering a nested container would cause the CHECK failure due to its parent's sandbox dir is unknown.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gilbert Gilbert Song
            gilbert Gilbert Song
            Qian Zhang Qian Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprint:
                Mesosphere Sprint 78 ended 30/Apr/18
                View on Board

                Slack

                  Issue deployment