Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7872

Scheduler hang when registration fails.

    XMLWordPrintableJSON

Details

    • Mesosphere Sprint 61, Mesosphere Sprint 62
    • 5

    Description

      I'm finding that if framework registration fails, the mesos driver client will hang indefinitely with the following output:

      I0809 20:04:22.479391    73 sched.cpp:1187] Got error ''FrameworkInfo.role' is not a valid role: Role '/test/role/slashes' cannot start with a slash'
      I0809 20:04:22.479658    73 sched.cpp:2055] Asked to abort the driver
      I0809 20:04:22.479843    73 sched.cpp:1233] Aborting framework 
      

      I'd have expected one or both of the following:

      • SchedulerDriver.run() should have exited with a failed Proto.Status of some form
      • Scheduler.error() should have been invoked when the "Got error" occurred

      Steps to reproduce:

      • Launch a scheduler instance, have it register with a known-bad framework info. In this case a role containing slashes was used
      • Observe that the scheduler continues in a TASK_RUNNING state despite the failed registration. From all appearances it looks like the Scheduler implementation isn't invoked at all

      I'd guess that because this failure happens before framework registration, there's some error handling that isn't fully initialized at this point.

      Attachments

        Activity

          People

            alexr Alex R
            tillt Till Toenshoff
            Anand Mazumdar Anand Mazumdar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: