Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7966

check for maintenance on agent causes fatal error

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.1, 1.5.0, 1.6.0
    • Fix Version/s: 1.7.0
    • Component/s: master
    • Sprint:
      Mesosphere Sprint 66
    • Story Points:
      5

      Description

      We interact with the maintenance API frequently to orchestrate gracefully draining agents of tasks without impacting service availability.

      Occasionally we seem to trigger a fatal error in Mesos when interacting with the api. This happens relatively frequently, and impacts us when downstream frameworks (marathon) react badly to leader elections.

      Here is the log line that we see when the master dies:

      F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: slaves[slaveId].maintenance.isSome()
      

      It's quite possibly we're using the maintenance API in the wrong way. We're happy to provide any other logs you need - please let me know what would be useful for debugging.

      Thanks.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              bennoe Benno Evers
              Reporter:
              robjohnson Rob Johnson
              Shepherd:
              Vinod Kone

              Dates

              • Created:
                Updated:
                Resolved:

                Agile

                  Issue deployment