Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9476

XFS project IDs aren't released upon task completion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.0
    • None
    • agent
    • Centos 7.1

      Mesos 1.7

    Description

      The XFS isolation doesn't release project IDs when a task finishes on Mesos 1.7 (branch 1.7.x), and once all project IDs are taken, scheduling new tasks fails with:

      Failed to assign project ID, range exhausted
      

       

      Attached is a vagrant configuration that sets up a VM with an XFS disk (mounted on /var/opt/mesos), zookeeper 3.4.12, mesos 1.7 and marathon 1.6.

      Once the box is ready, start zookeeper, mesos-master, mesos-agent (using the XFS disk) and marathon:

      sudo bin/zkServer.sh start
      
      sudo /home/vagrant/mesos/build/bin/mesos-master.sh --ip=192.168.33.10 --work_dir=/mnt/mesos
      
      sudo /home/vagrant/mesos/build/bin/mesos-agent.sh --master=192.168.33.10:5050 --work_dir=/var/opt/mesos --enforce_container_disk_quota --isolation=disk/xfs --xfs_project_range=[5000-5009]
      
      sudo MESOS_NATIVE_JAVA_LIBRARY="/home/vagrant/mesos/build/src/.libs/libmesos.so" sbt 'run --master 192.168.33.10:5050 --zk zk://localhost:2181/marathon'
      

       

      Create an app on marathon, for example:

      {"id": "/test", "cmd": "sleep 3600", "cpus": 0.01, "mem": 32, "disk": 1, "instances": 5}  
      

       

      You should see 5 project IDs being used:

      $ sudo xfs_quota -x -c "report -a -n -L 5000 -U 5009" | grep '^#[1-9][0-9]*'
      #5000 4 1024 1024 00 [--------]
      #5001 4 1024 1024 00 [--------]
      #5002 4 1024 1024 00 [--------]
      #5003 4 1024 1024 00 [--------]
      #5004 4 1024 1024 00 [--------]
      

       

      If you scale down to 0 instances, the project IDs aren't released.

      If you scale back up to 8 instances, only 5 of them will start, the remaining 3 will fail with errors like this:

      E1213 14:38:36.190430 20813 slave.cpp:6204] Container '064b8a6b-c42d-4905-b2a7-632318aa2b83' for executor 'test.c5e88a67-fee4-11e8-9cc6-0800278a1a98' of framework 0473e272-04f7-4b1d-ae1d-f7177940e295-0000 failed to start: Failed to assign project ID, range exhausted
      

       

      I've tested on Mesos 1.4, the project IDs are properly released when the task finishes.

      (I haven't tested other versions)

      Attachments

        1. Vagrantfile
          1.0 kB
          Omar AitMous
        2. build.sh
          3 kB
          Omar AitMous

        Activity

          People

            Unassigned Unassigned
            o.aitmous Omar AitMous
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: