Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3008

Don't put all replicas into one location with 2 locations and odd replica factor.

    XMLWordPrintableJSON

Details

    Description

      Accidentally I found that kudu will put all replicas of a table into one location when we only have 2 locations and the replica factor is odd. Below is the case:

      location /DEFAULT/22254  has 3 tservers
      location /DEFAULT/22255 has 3 tservers
      Table created: replica factor = 3, tablet = 8.

      Before I create the table, the ksck tablet summary is:

       

      Tablet Replica Count by Tablet Server
                     UUID               |                Host                | Replica Count |    Location    
      ----------------------------------+------------------------------------+---------------+----------------
       5f5ddec364834ce59282d37388010f06 | opencomputeoffline.xxxxxx.net:7056 | 10            | /DEFAULT/22255 
       00f24c36d39a49e8b77ff43b3bcbf0c9 | opencomputeoffline.xxxxxx.net:7054 | 10            | /DEFAULT/22255 
       d0091ae869704458865b9b079ad2389e | opencomputeoffline.xxxxxx.net:7055 | 9             | /DEFAULT/22255 
       507547dd183c4474855d55f7bdd9d526 | opencomputeoffline.xxxxxx.net:7052 | 7             | /DEFAULT/22254 
       c6a2b6e99f0a43308d9e5773b2d8c729 | opencomputeoffline.xxxxxx.net:7053 | 6             | /DEFAULT/22254 
       031808c37385477fb063e50fbc614f44 | opencomputeoffline.xxxxxx.net:7050 | 6             | /DEFAULT/22254 

      After I create the table, the ksck tablet summary is:

       

      Tablet Replica Count by Tablet Server
       UUID | Host | Replica Count | Location 
      ----------------------------------+------------------------------------+---------------+----------------
       507547dd183c4474855d55f7bdd9d526 | opencomputeoffline.xxxxxx.net:7052 | 15 | /DEFAULT/22254 
       c6a2b6e99f0a43308d9e5773b2d8c729 | opencomputeoffline.xxxxxx.net:7053 | 14 | /DEFAULT/22254 
       031808c37385477fb063e50fbc614f44 | opencomputeoffline.xxxxxx.net:7050 | 14 | /DEFAULT/22254 
       5f5ddec364834ce59282d37388010f06 | opencomputeoffline.xxxxxx.net:7056 | 10 | /DEFAULT/22255 
       00f24c36d39a49e8b77ff43b3bcbf0c9 | opencomputeoffline.xxxxxx.net:7054 | 10 | /DEFAULT/22255 
       d0091ae869704458865b9b079ad2389e | opencomputeoffline.xxxxxx.net:7055 | 9 | /DEFAULT/22255 

      I found that /DEFAULT/22255 doesn't have new replica and all replicas are located in /DEFAULT/22254. When look into the code I found that in PlacementPolicy::SelectLocation when location num is 2, we only take care about even replica factor and try to spread replicas evenly in 2 locations. I think we should also consider about the odd replica factor. When there is 2 locations, although there must have one location contains replicas more than half but it better than contains all replicas. 

      Attachments

        Activity

          People

            ZhangYao ZhangYao
            ZhangYao ZhangYao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: