Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8526

Broker may select a failed dir for new replica even in the presence of other live dirs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.1, 2.0.1, 2.1.1, 2.3.0, 2.2.1
    • 2.4.0
    • None
    • None

    Description

      Suppose a broker is configured with multiple log dirs. One of the log dirs fails, but there is no load on that dir, so the broker does not know about the failure yet, i.e., the failed dir is still in LogManager#_liveLogDirs. Suppose a new topic gets created, and the controller chooses the broker with failed log dir to host one of the replicas. The broker gets LeaderAndIsr request with isNew flag set. LogManager#getOrCreateLog() selects a log dir for the new replica from _liveLogDirs, then one two things can happen:
      1) getAbsolutePath can fail, in which case getOrCreateLog will throw an IOException
      2) Creating directory for new the replica log may fail (e.g., if directory becomes read-only, so getAbsolutePath worked). 

      In both cases, the selected dir will be marked offline (which is correct). However, LeaderAndIsr will return an error and replica will be marked offline, even though the broker may have other live dirs. 

      Proposed solution: Broker should retry selecting a dir for the new replica, if initially selected dir threw an IOException when trying to create a directory for the new replica. We should be able to do that in LogManager#getOrCreateLog() method, but keep in mind that logDirFailureChannel.maybeAddOfflineLogDir does not synchronously removes the dir from _liveLogDirs. So, it makes sense to select initial dir by calling LogManager#nextLogDir (current implementation), but if we fail to create log on that dir, one approach is to select next dir from _liveLogDirs in round-robin fashion (until we get to initial log dir – the case where all dirs failed).

      Attachments

        Issue Links

          Activity

            People

              soarez Igor Soarez
              apovzner Anna Povzner
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: