Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3064

Improve resync method to waste less time and data transfer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.2.1, 0.9.0.0
    • None
    • controller, network
    • None

    Description

      We have several topics which are large (65 GB per partition) with 12 partitions. Data rates into each topic vary, but in general each one has its own rate.

      After a raid rebuild, we are pulling all the data over to the newly rebuild raid. This takes forever, and has yet to complete after nearly 8 hours.

      Here are my observations:

      (1) The Kafka broker seems to pull from all topics on all partitions at the same time, starting at the oldest message.

      (2) When you divide total disk bandwidth available across all partitions (really, only 48 of which have significant amounts of data, about 65 * 12 = 780 GB each topic) the ingest rate of 36 out of 48 of them is higher than the available bandwidth.

      (3) The effect of (2) is that one topic SLOWLY catches up, while the other 4 topics continue to retrieve data at 75% of the bandwidth, just to toss it away because the source broker has discarded it already.

      (4) Eventually that one topic catches up, and the remaining bandwidth is then divided into the remaining 36 partitions, one group of which starts to catch up again.

      What I want to see is a way to say “don’t transfer more than X partitions at the same time” and ideally a priority rule that says, “Transfer partitions you are responsible for first, then transfer ones you are not. Also, transfer these first, then those, but no more than 1 topic at a time”

      What I REALLY want is for Kafka to track the new data (track the head of the log) and then ask for the tail in chunks. Ideally this would request from the source, “what is the next logical older starting point?” and then start there. This way, the transfer basically becomes a file transfer of the log stored on the source’s disk. Once that block is retrieved, it moves on to the next oldest. This way, there is almost zero waste as both the head and tail grow, but the tail runs the risk of losing the final chunk only. Thus, bandwidth is not significantly wasted.

      All this changes the ISR check to be is “am I caught up on head AND tail?” when the tail part is implied right now.

      Attachments

        Activity

          People

            nehanarkhede Neha Narkhede
            Skandragon Michael Graff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: