Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9147

Agent and scheduler driver authentication retry backoff time could overflow.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.5.1, 1.6.1
    • Fix Version/s: 1.4.3, 1.5.2, 1.6.2, 1.7.0
    • Component/s: None
    • Labels:
    • Target Version/s:
    • Sprint:
      Mesosphere Sprint 2018-26, Mesosphere Sprint 2018-27
    • Story Points:
      3

      Description

      In the agent we have the following retry backoff calculation logic:

      https://github.com/apache/mesos/blob/874c752316b14055c0a5a7b67f97ccf912abcc3c/src/slave/slave.cpp#L1401-L1418

          Duration backoff =
            flags.authentication_backoff_factor * std::pow(2, failedAuthentications);
      

      Since the `Duration` uses `int64_t` to hold nanosecond, if we set the `authentication_backoff_factor` to 1 second, we will overflow after 34 failed authentications (from second to nanosecond we lose 30 bits and 2^34 in the `pow()`).

      The effect is we do not backoff at all, we will just retry immediately after the 5s timeout:
      https://github.com/apache/mesos/blob/874c752316b14055c0a5a7b67f97ccf912abcc3c/src/master/master.cpp#L9615-L9619

      The scheduler driver also has the same issue.

      We should also audit all the other backoff logic.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mzhu Meng Zhu
                Reporter:
                mzhu Meng Zhu
                Shepherd:
                Benjamin Mahler
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: