Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-15490

Invalid path provided to the log failure channel upon I/O error when writing broker metadata checkpoint

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0, 3.4.1, 3.5.1, 3.6.1
    • 3.6.2
    • core
    • None

    Description

      There is a small bug/typo in the handling of I/O error when writing broker metadata checkpoint in KafkaServer. The path provided to the log dir failure channel is the full path of the checkpoint file whereas only the log directory is expected (source).

      case e: IOException =>
         val dirPath = checkpoint.file.getAbsolutePath
         logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing meta.properties to $dirPath", e)

      As a result, after an IOException is captured and enqueued in the log dir failure channel (<logDir> is to be replaced with the actual path of the log directory):

      [2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to <logDir>/meta.properties (kafka.server.LogDirFailureChannel) java.io.IOException

      The log dir failure handler cannot lookup the log directory:

      [2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to (kafka.server.ReplicaManager$LogDirFailureHandler) org.apache.kafka.common.errors.LogDirNotFoundException: Log dir <logDir>/meta.properties is not found in the config.

      An immediate fix for this is to use the logDir provided from to the checkpointing method instead of the path of the metadata file.

      For brokers with only one log directory, this bug will result in preventing the broker from shutting down as expected.

      The LogDirNotFoundException then kills the log dir failure handler thread, and subsequent IOException are not handled, and the broker never stops.

      [2024-02-27 02:13:13,564] INFO [LogDirFailureHandler]: Stopped (kafka.server.ReplicaManager$LogDirFailureHandler)

      Another consideration here is whether the LogDirNotFoundException should terminate the log dir failure handler thread.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            divijvaidya Divij Vaidya
            adupriez Alexandre Dupriez
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment