[MESOS-8416] CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled. - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.1, 1.6.0
Component/s: containerization
Labels:
- containerizer
- mesosphere

Target Version/s:

1.5.1, 1.6.0
Epic Link:
Unified Container
Sprint:
Mesosphere Sprint 78
Story Points:
5

Description

I0108 23:05:25.313344 31743 slave.cpp:620] Agent attributes: [  ]
I0108 23:05:25.313832 31743 slave.cpp:629] Agent hostname: vagrant-ubuntu-wily-64
I0108 23:05:25.314916 31763 task_status_update_manager.cpp:181] Pausing sending task status updates
I0108 23:05:25.323496 31766 state.cpp:66] Recovering state from '/var/lib/mesos/slave/meta'
I0108 23:05:25.323639 31766 state.cpp:724] No committed checkpointed resources found at '/var/lib/mesos/slave/meta/resources/resources.info'
I0108 23:05:25.326169 31760 task_status_update_manager.cpp:207] Recovering task status update manager
I0108 23:05:25.326954 31759 containerizer.cpp:674] Recovering containerizer
F0108 23:05:25.331529 31759 containerizer.cpp:919] CHECK_SOME(container->directory): is NONE 
*** Check failure stack trace: ***
    @     0x7f769dbc98bd  google::LogMessage::Fail()
    @     0x7f769dbc8c8e  google::LogMessage::SendToLog()
    @     0x7f769dbc958d  google::LogMessage::Flush()
    @     0x7f769dbcca08  google::LogMessageFatal::~LogMessageFatal()
    @     0x556cb4c2b937  _CheckFatal::~_CheckFatal()
    @     0x7f769c5ac653  mesos::internal::slave::MesosContainerizerProcess::recover()

If the framework does not enable the checkpointing. It means there is no slave state checkpointed. But containers are still checkpointed at the runtime dir, which mean recovering a nested container would cause the CHECK failure due to its parent's sandbox dir is unknown.

Attachments

Issue Links

Add Link

is duplicated by

MESOS-8278 Mesos Containerizer cannot recover due to check failure.

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Gilbert Song

Reporter:: Gilbert Song

Shepherd:: Qian Zhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Jan/18 23:35

Updated:: 14/Jun/18 19:58

Resolved:: 17/Apr/18 18:59

Agile

Completed Sprint:: Mesosphere Sprint 78 ended 30/Apr/18

View on Board

CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment