Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9476

XFS project IDs aren't released upon task completion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.0
    • None
    • agent
    • Centos 7.1

      Mesos 1.7

    Description

      The XFS isolation doesn't release project IDs when a task finishes on Mesos 1.7 (branch 1.7.x), and once all project IDs are taken, scheduling new tasks fails with:

      Failed to assign project ID, range exhausted
      

       

      Attached is a vagrant configuration that sets up a VM with an XFS disk (mounted on /var/opt/mesos), zookeeper 3.4.12, mesos 1.7 and marathon 1.6.

      Once the box is ready, start zookeeper, mesos-master, mesos-agent (using the XFS disk) and marathon:

      sudo bin/zkServer.sh start
      
      sudo /home/vagrant/mesos/build/bin/mesos-master.sh --ip=192.168.33.10 --work_dir=/mnt/mesos
      
      sudo /home/vagrant/mesos/build/bin/mesos-agent.sh --master=192.168.33.10:5050 --work_dir=/var/opt/mesos --enforce_container_disk_quota --isolation=disk/xfs --xfs_project_range=[5000-5009]
      
      sudo MESOS_NATIVE_JAVA_LIBRARY="/home/vagrant/mesos/build/src/.libs/libmesos.so" sbt 'run --master 192.168.33.10:5050 --zk zk://localhost:2181/marathon'
      

       

      Create an app on marathon, for example:

      {"id": "/test", "cmd": "sleep 3600", "cpus": 0.01, "mem": 32, "disk": 1, "instances": 5}  
      

       

      You should see 5 project IDs being used:

      $ sudo xfs_quota -x -c "report -a -n -L 5000 -U 5009" | grep '^#[1-9][0-9]*'
      #5000 4 1024 1024 00 [--------]
      #5001 4 1024 1024 00 [--------]
      #5002 4 1024 1024 00 [--------]
      #5003 4 1024 1024 00 [--------]
      #5004 4 1024 1024 00 [--------]
      

       

      If you scale down to 0 instances, the project IDs aren't released.

      If you scale back up to 8 instances, only 5 of them will start, the remaining 3 will fail with errors like this:

      E1213 14:38:36.190430 20813 slave.cpp:6204] Container '064b8a6b-c42d-4905-b2a7-632318aa2b83' for executor 'test.c5e88a67-fee4-11e8-9cc6-0800278a1a98' of framework 0473e272-04f7-4b1d-ae1d-f7177940e295-0000 failed to start: Failed to assign project ID, range exhausted
      

       

      I've tested on Mesos 1.4, the project IDs are properly released when the task finishes.

      (I haven't tested other versions)

      Attachments

        1. build.sh
          3 kB
          Omar AitMous
        2. Vagrantfile
          1.0 kB
          Omar AitMous

        Activity

          People

            Unassigned Unassigned
            o.aitmous Omar AitMous
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: