Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-770

Rate control and randomization of Replicated Log catching-up

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: replicated log
    • Labels:

      Description

      When the log is catching up either in the process of recovering or after coordinator failover the Paxos protocol is run on multiple positions (possibly the entire log).

      Currently the catch-up process is linear (one thread fills positions one-by-one). What's preventing us from catching up all positions concurrently is that too much concurrency could have negative impact on the network and the problem may be exacerbated by the contention between multiple recovering replicas and the coordinator.

      Rate control helps limit the number of concurrent positions a proposer (recoverer or coordinator) seeks consensus at a time. We can batch a number of positions each time.

      Randomly picking the positions in each batch reduces the possibility that multiple proposers contend for the same position at the same time which causes conflict and retries.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                xujyan Yan Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: