Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8440

`network/ports` isolator kills legitimate tasks on recovery.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.6.0
    • containerization
    • None

    Description

      At recovery time, the containerizer sends all the resources except the ports. This means that the ports check will race against the subsequent resources update. The root cause of this is that only the executor resources are provided at recovery time, whereas at update time the isolator gets the whole container resources as calculated by Executor::allocatedResources().

      I0112 08:22:23.930830 28937 linux_launcher.cpp:300] Recovered container 80a2d9dc-0492-4af5-a131-05f1cd66d672
      I0112 08:22:23.931637 28933 ports.cpp:398] recovering container executor_info {
        executor_id {
          value: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
        }
        resources {
          name: "cpus"
          type: SCALAR
          scalar {
            value: 0.1
          }
          allocation_info {
            role: "*"
          }
        }
        resources {
          name: "mem"
          type: SCALAR
          scalar {
            value: 32
          }
          allocation_info {
            role: "*"
          }
        }
        command {
          value: "/home/jpeach/src/mesos/build/src/mesos-executor"
          shell: false
          arguments: "mesos-executor"
          arguments: "--launcher_dir=/home/jpeach/src/mesos/build/src"
        }
        framework_id {
          value: "4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000"
        }
        name: "Command Executor (Task: fff42f68-4aed-4ca6-a62f-71b7166bbd7a) (Command: sh -c \'nc -k -l 31446\')"
        source: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
      }
      container_id {
        value: "80a2d9dc-0492-4af5-a131-05f1cd66d672"
      }
      pid: 28955
      directory: "/tmp/NetworkPortsIsolatorTest_ROOT_NC_RecoverGoodTask_eTlVKl/slaves/4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0/frameworks/4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000/executors/fff42f68-4aed-4ca6-a62f-71b7166bbd7a/runs/80a2d9dc-0492-4af5-a131-05f1cd66d672"
      I0112 08:22:23.932137 28933 ports.cpp:530] Updated ports to [] for container 80a2d9dc-0492-4af5-a131-05f1cd66d672
      I0112 08:22:23.932982 28937 provisioner.cpp:493] Provisioner recovery complete
      I0112 08:22:23.933924 28928 slave.cpp:6581] Sending reconnect request to executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework 4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000 at executor(1)@17.228.224.108:42187
      I0112 08:22:23.934587 28957 exec.cpp:282] Received reconnect request from agent 4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
      I0112 08:22:23.935724 28931 slave.cpp:4426] Received re-registration message from executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework 4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000
      I0112 08:22:23.936646 28967 exec.cpp:259] Executor re-registered on agent 4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
      I0112 08:22:23.936820 28929 ports.cpp:530] Updated ports to [31446-31446] for container 80a2d9dc-0492-4af5-a131-05f1cd66d672
      

      Attachments

        Activity

          People

            jamespeach James Peach
            jamespeach James Peach
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: