Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9109

Windows agent uses reserved character :(colon) for file name and crashes when attempting to remove link

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • agent

    Description

      I have a hybrid cluster running Mesos Agents on Windows, and I am using Chronos to launch jobs on Windows Agents.

      Chronos is using the character : (colon) internally when spawning jobs. The Windows Mesos Agent spawns those jobs and creates the paths on disk, but when the job terminates and it attempts to remove the link it crashes with the following error message 

        

      I0719 09:20:00.621385 14788 gc.cpp:129] Unscheduling 'D:\ws\mes-wd\meta\slaves\5563b512-518e-44c6-bdc1-3c927d0622da-S1\frameworks\77a0fb6f-3c43-4d7b-ae16-af2dfd728567-0000\executors\ct:1532006400000:0
      :sample-child-job-lv2:' from gc
      I0719 09:20:00.622387 24124 slave.cpp:2406] Authorizing task 'ct:1532006400000:0:sample-child-job2:' for framework 77a0fb6f-3c43-4d7b-ae16-af2dfd728567-0000
      I0719 09:20:00.630340 24124 slave.cpp:2406] Authorizing task 'ct:1532006400000:0:sample-child-job-lv2:' for framework 77a0fb6f-3c43-4d7b-ae16-af2dfd728567-0000
      I0719 09:20:00.644341 24124 slave.cpp:2849] Launching task 'ct:1532006400000:0:sample-child-job2:' for framework 77a0fb6f-3c43-4d7b-ae16-af2dfd728567-0000
      I0719 09:20:00.649345 24124 paths.cpp:748] Creating sandbox 'D:\ws\mes-wd\slaves\5563b512-518e-44c6-bdc1-3c927d0622da-S1\frameworks\77a0fb6f-3c43-4d7b-ae16-af2dfd728567-0000\executors\ct:1532006400000
      :0:sample-child-job2:\runs\cecbf7ab-ace3-4f45-a208-9c104f69624c'
      F0719 09:20:00.653342 24124 paths.cpp:763] CHECK_SOME(os::rm(latest)): The filename, directory name, or volume label syntax is incorrect.
      Failed to remove latest symlink 'D:\ws\mes-wd\slaves\5563b512-518e-44c6-bdc1-3c927d0622da-S1\frameworks\77a0fb6f-3c43-4d7b-ae16-af2dfd728567-0000\executors\ct:1532006400000:0:sample-child-job2:\runs\
      latest'
      *** Check failure stack trace: ***
      

       

      The problem seems to be the job name: 

      'ct:1532006400000:0:sample-child-job2:'
      

      Chronos is using internally : (colon) which is a reserved character on Windows https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file 

      I believe it's the responsibility of the agent to check and sanitize the task names against restricted characters.

      Attachments

        Activity

          People

            Unassigned Unassigned
            edis Constantin Eduard Staniloiu
            Andrew Schwartzmeyer Andrew Schwartzmeyer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: