Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7872

Scheduler hang when registration fails.

    XMLWordPrintableJSON

    Details

    • Target Version/s:
    • Sprint:
      Mesosphere Sprint 61, Mesosphere Sprint 62
    • Story Points:
      5

      Description

      I'm finding that if framework registration fails, the mesos driver client will hang indefinitely with the following output:

      I0809 20:04:22.479391    73 sched.cpp:1187] Got error ''FrameworkInfo.role' is not a valid role: Role '/test/role/slashes' cannot start with a slash'
      I0809 20:04:22.479658    73 sched.cpp:2055] Asked to abort the driver
      I0809 20:04:22.479843    73 sched.cpp:1233] Aborting framework 
      

      I'd have expected one or both of the following:

      • SchedulerDriver.run() should have exited with a failed Proto.Status of some form
      • Scheduler.error() should have been invoked when the "Got error" occurred

      Steps to reproduce:

      • Launch a scheduler instance, have it register with a known-bad framework info. In this case a role containing slashes was used
      • Observe that the scheduler continues in a TASK_RUNNING state despite the failed registration. From all appearances it looks like the Scheduler implementation isn't invoked at all

      I'd guess that because this failure happens before framework registration, there's some error handling that isn't fully initialized at this point.

        Attachments

          Activity

            People

            • Assignee:
              alexr Alex R
              Reporter:
              tillt Till Toenshoff
              Shepherd:
              Anand Mazumdar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: