Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2020

tserver failure causes multiple tablet copy operations per under-replicated tablet



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.1
    • 1.3.2, 1.4.0
    • tserver
    • None


      Combing through logs of a cluster recovering from tserver failure and reading the tablet copy code revealed that if a tserver's tablet copy threadpool is full (which defaults to 10 slots), and a duplicate tablet copy request is received from a leader, the tablet server will respond with a THROTTLED error instead of an ALREADY_INPROGRESS error. If this situation continues for 300 seconds, the leader will eject the replica, which has the effect of starting anothe tablet copy on another tserver. Meanwhile, the original tablet copy may still be making progress in the first follower tserver.

      As a result, a single tablet can be in the process of being copied multiple times if a cluster is undergoing many tablet copies, for instance if a tserver with hundreds of tablets fails.

      It's expected that a cluster which is recovering from tserver failure would have roughly equal aggregate read and write IO throughput, however because of this bug, the recovering cluster exhibited significantly more write disk throughput than read, since many tablet copies were served from cache. Attached is a representative graph. The spike around 5:07 is when fsyncs were disabled cluster wide (fsync tablet copy overhead is being tracked in KUDU-1726).

      Thanks to Adar and Todd for helping me track this down.


        1. screenshot-1.png
          139 kB
          Dan Burkert



            danburkert Dan Burkert
            danburkert Dan Burkert
            0 Vote for this issue
            2 Start watching this issue