Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8630

All subsequent registry operations fail after the registrar is aborted after a failed update

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • master
    • None

    Description

      Failure to update registry always aborts the registrar but don't always abort the master process.

      When the registrar fails to update the registry it would abort the actor and fail all future operations. The rationale as explained here: https://github.com/apache/mesos/commit/5eaf1eb346fc2f46c852c1246bdff12a89216b60

      In this event, the Master won't commit suicide until the initial
      failure is processed. However, in the interim, subsequent operations
      are potentially being performed against the Registrar. This could lead
      to fighting between masters if a "demoted" master re-attempts to
      acquire log-leadership!

      However when the registrar updates is requested by an operator API (maintenance, quota update, etc) the master process doesn't shut down (a 500 error is returned to the client instead) and all subsequent operations will fail!

      Attachments

        Activity

          People

            fiu Xudong Ni
            xujyan Yan Xu
            Yan Xu Yan Xu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: