Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1317

Tablet re-replication is not well spread across a cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.7.0
    • 0.7.0
    • master
    • None

    Description

      This is an issue that decster noticed on his ~70 node cluster. When a server hosting many tablets goes down, each of those tablets has to create new replicas elsewhere. We would expect in a 70-node cluster that all other nodes would participate in recovery in order to minimize the recovery time. However, we found that only a small number of nodes acted as 'sources' for making new tablet replicas.

      The issue is that the master currently assigns replicas in a strict round-robin. So, if we have a cluster with a number of TS which is a multiple of three, this means that we end up with servers

      {A,B,C}

      having the same set of replicas, servers

      {D,E,F}

      having another set, etc. So, if a server fails, only two servers can act as re-replication sources. If the number of servers is not a multiple of three, the problem is not quite as bad, but still limited to 4 (the two "adjacent" servers).

      The master should spread out the replicas more randomly so that when a server goes down, a large number of other servers can act as sources for re-replication.

      Attachments

        1. placement.py
          1.0 kB
          Todd Lipcon

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: