Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10672

Restarting Kafka always takes a lot of time

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0
    • None
    • core
    • None
    • A cluster of 21 Kafka nodes;
      Each node has 12 disks;
      Each node has about 1500 partitions;
      There are approximately 700 leader partitions per node;
      Slow-loading partitions have about 1000 log segments;

    Description

      When the snapshot file does not exist, or the latest snapshot file before the current active period, restoring the state of producers will traverse the log section, it will traverse the log all batch, in the period when the individual broker node partition number many, that there are most of the number of logs, can cause a lot of IO number, IO will only load one batch at a time, such as a log there will always be in the tens of thousands of batch, I found that in the code for each batch are at least two IO operation, when a batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be generated, and then at least 65,536 *2= 131,072 IO operations will be generated, which will lead to a lot of time spent in kafka startup process. We configured 15 log recovery threads in the production environment, and it still took more than 2 hours to load a partition,can community puts forward some proposals to the situation or improve.For detailed logs, see the section on test-perf-18 partitions in the nearby logs

      Attachments

        1. AbstractIterator.java
          3 kB
          Wenbing Shen
        2. AbstractIteratorOfRestart.java
          3 kB
          Wenbing Shen
        3. AbstractLegacyRecordBatch.java
          22 kB
          Wenbing Shen
        4. ByteBufferLogInputStream.java
          4 kB
          Wenbing Shen
        5. DefaultRecordBatch.java
          27 kB
          Wenbing Shen
        6. FileLogInputStream.java
          15 kB
          Wenbing Shen
        7. FileRecords.java
          22 kB
          Wenbing Shen
        8. LazyDownConversionRecords.java
          10 kB
          Wenbing Shen
        9. Log.scala
          107 kB
          Wenbing Shen
        10. LogInputStream.java
          2 kB
          Wenbing Shen
        11. LogManager.scala
          39 kB
          Wenbing Shen
        12. LogSegment.scala
          33 kB
          Wenbing Shen
        13. MemoryRecords.java
          34 kB
          Wenbing Shen
        14. RecordBatchIterator.java
          2 kB
          Wenbing Shen
        15. RecordBatchIteratorOfRestart.java
          2 kB
          Wenbing Shen
        16. Records.java
          6 kB
          Wenbing Shen
        17. server.log
          33 kB
          Wenbing Shen

        Activity

          People

            Unassigned Unassigned
            wenbing.shen Wenbing Shen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: