Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8004

Failed to kill all processes in the container due to cgroup freeze failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • 1.2.1
    • None
    • agent, containerization
    • CentOS Linux release 7.2.1511 (Core)
      3.10.0-327.36.3.el7.x86_64
      Mesos 1.2.1

    Description

      When using Mesos unified container, executor can not be destroyed because cgroup freeze operation failed. The logs from agent show that launcher tries to freeze cgroup several times, then timeout occurs. However, the content of /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8/freezer.state is "FROZEN".

      I0921 18:00:58.339440 3493 containerizer.cpp:2465] Container e2778ccd-c7e5-4289-b382-e05f063200d8 has exited
      I0921 18:00:58.339519 3493 containerizer.cpp:2102] Destroying container e2778ccd-c7e5-4289-b382-e05f063200d8 in RUNNING state
      I0921 18:00:58.339645 3484 linux_launcher.cpp:505] Asked to destroy container e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:00:58.340553 3484 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:00:58.342226 3493 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:00.042708 3475 slave.cpp:5155] Killing executor '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108
      I0921 18:01:02.009097 3483 process.cpp:3704] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/containers'
      W0921 18:01:02.011672 3491 containerizer.cpp:2055] Skipping status for container e2778ccd-c7e5-4289-b382-e05f063200d8 because: Container does not exist
      I0921 18:01:04.269701 3487 slave.cpp:5732] Querying resource estimator for oversubscribable resources
      I0921 18:01:04.269775 3487 slave.cpp:5266] Current disk usage 0.11%. Max allowed age: 6.292478769607581days
      I0921 18:01:04.270349 3506 slave.cpp:5746] Received oversubscribable resources {} from the resource estimator
      I0921 18:01:08.300772 3474 slave.cpp:4346] Received ping from slave-observer(30)@10.16.85.66:5050
      I0921 18:01:08.345176 3517 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:08.347452 3517 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 2.183168ms
      I0921 18:01:08.347561 3517 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      E0921 18:01:15.192441 3524 perf_event.cpp:176] Perf sample of 10secs failed to complete within 12secs; sampling will be halted
      E0921 18:01:15.192819 3489 perf_event.cpp:199] Failed to get the perf sample: timeout
      I0921 18:01:18.350342 3488 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:18.352532 3488 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 2.121984ms
      I0921 18:01:18.352646 3481 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:19.301443 3520 slave.cpp:5732] Querying resource estimator for oversubscribable resources
      I0921 18:01:19.301566 3501 slave.cpp:5746] Received oversubscribable resources {} from the resource estimator
      I0921 18:01:23.307291 3518 slave.cpp:4346] Received ping from slave-observer(30)@10.16.85.66:5050
      I0921 18:01:28.121094 3491 process.cpp:3704] Handling HTTP event for process 'metrics' with path: '/metrics/snapshot'
      I0921 18:01:28.355551 3493 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:28.357792 3493 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 2.177024ms
      I0921 18:01:28.357890 3493 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:34.302625 3503 slave.cpp:5732] Querying resource estimator for oversubscribable resources
      I0921 18:01:34.302738 3483 slave.cpp:5746] Received oversubscribable resources {} from the resource estimator
      I0921 18:01:38.315979 3505 slave.cpp:4346] Received ping from slave-observer(30)@10.16.85.66:5050
      I0921 18:01:38.360709 3511 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:38.362891 3511 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 2.12608ms
      I0921 18:01:38.362993 3475 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8

      I0921 18:01:48.366251 3492 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8
      I0921 18:01:48.368404 3496 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8 after 2.080256ms
      I0921 18:01:48.368501 3496 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e2778ccd-c7e5-4289-b382-e05f063200d8

      E0921 18:01:58.342779 3478 slave.cpp:4746] Termination of executor '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 failed: Failed to kill all processes in the container: Timed out after 1mins
      I0921 18:01:58.342830 3478 slave.cpp:4868] Cleaning up executor '47eb9350-9ab4-41f8-a5cd-39e855532b53' of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110 at executor(1)@172.29.0.18:40108
      I0921 18:01:58.364516 3475 gc.cpp:55] Scheduling '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8' for gc 6.99999578195556days in the future
      I0921 18:01:58.364591 3475 gc.cpp:55] Scheduling '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53' for gc 6.9999957811437days in the future
      I0921 18:01:58.364604 3478 slave.cpp:4956] Cleaning up framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
      I0921 18:01:58.364615 3475 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53/runs/e2778ccd-c7e5-4289-b382-e05f063200d8' for gc 6.99999578062519days in the future
      I0921 18:01:58.364670 3475 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110/executors/47eb9350-9ab4-41f8-a5cd-39e855532b53' for gc 6.99999578024296days in the future
      I0921 18:01:58.364683 3479 status_update_manager.cpp:285] Closing status update streams for framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
      I0921 18:01:58.364702 3475 gc.cpp:55] Scheduling '/data/mesos/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110' for gc 6.9999957791437days in the future
      I0921 18:01:58.364725 3479 status_update_manager.cpp:531] Cleaning up status update stream for task 47eb9350-9ab4-41f8-a5cd-39e855532b53 of framework 23aad131-26f7-44fd-9baa-dfb55e3e3926-0110
      I0921 18:01:58.364740 3475 gc.cpp:55] Scheduling '/data/mesos/meta/slaves/23aad131-26f7-44fd-9baa-dfb55e3e3926-S5/frameworks/23aad131-26f7-44fd-9baa-dfb55e3e3926-0110' for gc 6.99999577881778days in the future

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              highfly Haiwei Zhou
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: