Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3359

Parallel log-recovery of un-flushed segments on startup

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 0.10.0.0, 0.11.0.0, 1.0.0
    • None
    • log
    • None
    • Patch

    Description

      On startup, currently the log segments within a logDir are loaded sequentially when there is a un-clean shutdown. This will take a lot of time for the segments to be loaded as the logSegment.recover(..) is called for every segment and for brokers which have many partitions, the time taken will be very high (we have noticed ~40mins for 2k partitions).

      https://github.com/apache/kafka/pull/1035

      This pull request will make the log-segment load parallel with two configurable properties "log.recovery.threads" and "log.recovery.max.interval.ms".

      Logic:
      1. Have a threadpool defined of fixed length (log.recovery.threads)
      2. Submit the logSegment recovery as a job to the threadpool and add the future returned to a job list
      3. Wait till all the jobs are done within req. time (log.recovery.max.interval.ms - default set to Long.Max).
      4. If they are done and the futures are all null (meaning that the jobs are successfully completed), it is considered done.
      5. If any of the recovery jobs failed, then it is logged and LogRecoveryFailedException is thrown
      6. If the timeout is reached, LogRecoveryFailedException is thrown.

      The logic is backward compatible with the current sequential implementation as the default thread count is set to 1.

      PS: I am new to Scala and the code might look Java-ish but I will be happy to modify the code review changes.

      Attachments

        Issue Links

          Activity

            People

              jkreps Jay Kreps
              vamsi360 Vamsi Subhash Achanta
              Grant Henke Grant Henke
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: