Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Mesos Q3 Sprint 5
-
2
Description
In the past we've seen numerous issues around the freezer. Lately, on the 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup:
(1) An oom occurs.
(2) No indication of oom in the kernel logs.
(3) The slave is unable to freeze the cgroup.
(4) The task is marked as lost.
I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 15488MB Maximum Used: 15488MB MEMORY STATISTICS: cache 7958691840 rss 8281653248 mapped_file 9474048 pgpgin 4487861 pgpgout 522933 pgfault 2533780 pgmajfault 11 inactive_anon 0 active_anon 8281653248 inactive_file 7631708160 active_file 326852608 unevictable 0 hierarchical_memory_limit 16240345088 total_cache 7958691840 total_rss 8281653248 total_mapped_file 9474048 total_pgpgin 4487861 total_pgpgout 522933 total_pgfault 2533780 total_pgmajfault 11 total_inactive_anon 0 total_active_anon 8281653248 total_inactive_file 7631728640 total_active_file 326852608 total_unevictable 0 I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource mem(*):1.62403e+10 and will be terminated I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 'bbb9732a-d600-4c1b-b326-846338c608c3' I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.710848ms I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.588224ms I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 2.15296ms I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.643008ms I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed age: 5.630238827780799days I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.511168ms I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for '/slave(1)/stats.json' E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of framework '201104070004-0000002563-0000' failed: Failed to destroy container: discarded future I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 from @0.0.0.0:0 I0903 16:47:24.963470 25469 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' to 128MB for container bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:24.963541 25471 cpushare.cpp:338] Updated 'cpu.shares' to 256 (cpus 0.25) for container bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:24.964756 25471 cpushare.cpp:359] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 25ms (cpus 0.25) for container bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:43.406610 25476 status_update_manager.cpp:320] Received status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.406991 25476 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.410475 25476 status_update_manager.cpp:373] Forwarding status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 to master@<scrubbed_ip>:5050 I0903 16:47:43.439923 25480 status_update_manager.cpp:398] Received status update acknowledgement (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.440115 25480 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.443595 25480 slave.cpp:2709] Cleaning up executor 'E' of framework 201104070004-0000002563-0000
We should consider avoiding the freezer entirely in favor of a kill(2) loop. We don't have to wait for pid namespaces to remove the freezer dependency.
At the very least, when the freezer fails, we should proceed with a kill(2) loop to ensure that we destroy the cgroup.
Attachments
Issue Links
- is related to
-
MESOS-1765 Use PID namespace to avoid freezing cgroup
- Resolved
- relates to
-
MESOS-1689 Race with kernel to kill process / destroy cgroup after OOM
- Resolved