Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10085

Operator API events are silently dropped on transient authorization failures.

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      One of the purposes of the operator V1 API events is to allow subscribers maintain an up-to-date view of master's state: as a response to SUBSCRIBE call, the events subscriber first receives an initial view of master state and then receives updates to that view in the form of `Event`s.

      The parts of the state and updates to them which the subscriber's principal is not authorized to see, are filtered out by objectApprover::approve() method.

      In case of authorization failure, `approve()` returns an Error.
      Currently, the event filtering code handles `false` (i.e. not authorized) and Error in the same way: the event is dropped.
      (See https://github.com/apache/mesos/blob/f8a3dd334934094ec44e07fa350f958d218bc78f/src/common/http.hpp#L414 and, for example, https://github.com/apache/mesos/blob/f8a3dd334934094ec44e07fa350f958d218bc78f/src/master/master.cpp#L12257 )

      In presence of transient authorization failures, this can lead to inconsistencies in Event stream. The simplet example would be receiving TASK_UPDATED event without ever receiving TASK_ADDED for the task in question.
      Such inconsistencies may result in the subscriber being unable to maintain correct view of master's state.

      One of the options to fix this issue is to disconnect the subscriber in case of authorization failure, so that it gets the full master's view when it subscribes back.

      Note that before introduction of synchronous authorization (in Mesos 1.9 and earlier) this issue also existed, but the transient errors were happening in `Authorizer::getObjectApprover()` method which was then called per event (as opposed to per-subscriber after synchronous authz was introduced).

      Similar issue is present in processing of Operator API calls, including SUBSCRIBE call: the objects are silently dropped on transient authorization failures (see MESOS-10099).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dzhu Dong Zhu
            asekretenko Andrei Sekretenko
            Andrei Sekretenko Andrei Sekretenko

            Dates

              Created:
              Updated:

              Slack

                Issue deployment