Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1649

Network isolator should tolerate slave crashes while doing isolate/cleanup.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.20.0
    • None
    • None

    Description

      A slave may crash while we are installing/removing filters. The slave recovery for the network isolator should tolerate those partially installed filters. Also, we want to avoid leaking a filter on host eth0 and host lo.

      The current code cannot tolerate that, thus may cause the following error:

      Failed to perform recovery: Collect failed: Failed to recover container d409a100-2afb-497c-864f-fe3002cf65d9 with pid 50405: No ephemeral ports found
      To remedy this do as follows:
      Step 1: rm -f /var/lib/mesos/meta/slaves/latest
             This ensures slave doesn't recover old live executors.
      Step 2: Restart the slave.
      

      Attachments

        Activity

          People

            jieyu Jie Yu
            jieyu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: