Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7376

Reduce copying of the Registry to improve Registrar performance.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: master
    • Labels:
      None

      Description

      During scale testing we discovered that as the number of registered agents grows the time it takes to update the registry grows to unacceptable values very fast. At some point it starts exceeding registry_store_timeout which doesn't fire.

      With 55k agents we saw this (registry_store_timeout=20secs):

      I0331 17:11:21.227442 36472 registrar.cpp:473] Applied 69 operations in 3.138843387secs; attempting to update the registry
      I0331 17:11:24.441409 36464 log.cpp:529] LogStorage.set: acquired the lock in 74461ns
      I0331 17:11:24.441541 36464 log.cpp:543] LogStorage.set: started in 51770ns
      I0331 17:11:26.869323 36462 log.cpp:628] LogStorage.set: wrote append at position=6420881 in 2.41043644secs
      I0331 17:11:26.869454 36462 state.hpp:179] State.store: storage.set has finished in 2.428189561secs (b=1)
      I0331 17:11:56.199453 36469 registrar.cpp:518] Successfully updated the registry in 34.971944192secs
      

      This is caused by repeated Registry copying which involves copying a big object graph that takes roughly 0.4 sec (with 55k agents).

        Attachments

          Activity

            People

            • Assignee:
              ipronin Ilya Pronin
              Reporter:
              ipronin Ilya Pronin
              Shepherd:
              Benjamin Mahler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: