Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6337

Nested containers getting killed before network isolation can be applied to them.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • containerization
    • Linux

    Description

      Seeing this odd behavior in one of our clusters:
      ```
      http.cpp:1948] Failed to launch nested container cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: Collect failed: Failed to seed container cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: Collect failed: Failed to setup hostname and network files: Failed to enter the mount namespace of pid 21591: Pid 21591 does not exist
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.894485 31531 containerizer.cpp:1931] Destroying container cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e in ISOLATING state
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.894439 31531 containerizer.cpp:2300] Container cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e has exited
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.854456 31534 systemd.cpp:96] Assigned child process '21591' to 'mesos_executors.slice'
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.831861 21580 process.cpp:882] Failed SSL connections will be downgraded to a non-SSL socket
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831526 21580 openssl.cpp:432] Will only verify peer certificate if presented!
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831521 21580 openssl.cpp:426] Will not verify peer certificate!
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831511 21580 openssl.cpp:421] CA directory path unspecified! NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=<dirpath>
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.831405 21580 openssl.cpp:399] Failed SSL connections will be downgraded to a non-SSL socket
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: WARNING: Logging before InitGoogleLogging() is written to STDERR
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.828413 21581 process.cpp:882] Failed SSL connections will be downgraded to a non-SSL socket
      Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
      ```
      The above log is "reverse" chronological order, so please read it bottom up.

      The relevant log is:
      ```
      http.cpp:1948] Failed to launch nested container cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: Collect failed: Failed to seed container cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: Collect failed: Failed to setup hostname and network files: Failed to enter the mount namespace of pid 21591: Pid 21591 does not exist
      ```
      Looks like the nested container failed to launch because the `isolate` call to the `network/cni` isolator failed. Seems like when the isolator received the `isolate` call the PID for the nested container has already exited and it couldn't enter its mount namespace to setup the network files.

      The odd thing here is that the nested container would have been frozen, and hence was not running, so not sure what killed the nested container. My suspicion falls on systemd, since I also see this log message:
      ```
      Oct 07 18:02:31 ip-10-10-0-207 mesos-agent[31520]: I1007 18:02:31.473656 31532 systemd.cpp:96] Assigned child process '1596' to 'mesos_executors.slice'
      ```

      Attachments

        Issue Links

          Activity

            People

              gilbert Gilbert Song
              avinash.mesos Avinash Sridharan
              Jie Yu Jie Yu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: