Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-770

Rate control and randomization of Replicated Log catching-up

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • replicated log

    Description

      When the log is catching up either in the process of recovering or after coordinator failover the Paxos protocol is run on multiple positions (possibly the entire log).

      Currently the catch-up process is linear (one thread fills positions one-by-one). What's preventing us from catching up all positions concurrently is that too much concurrency could have negative impact on the network and the problem may be exacerbated by the contention between multiple recovering replicas and the coordinator.

      Rate control helps limit the number of concurrent positions a proposer (recoverer or coordinator) seeks consensus at a time. We can batch a number of positions each time.

      Randomly picking the positions in each batch reduces the possibility that multiple proposers contend for the same position at the same time which causes conflict and retries.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xujyan Yan Xu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: