Kafka
  1. Kafka
  2. KAFKA-768

broker should exit if hitting exceptions durin startup

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: core
    • Labels:

      Description

      A broker hit the following exception, but didn't exit.

      2013/02/20 01:54:21.341 FATAL [KafkaServerStartable] [main] [kafka] [] Fatal error during KafkaServerStable startup. Prepare to shutdown
      kafka.common.KafkaException: Failed to create data directory /export/content/kafka/i001_caches
      at kafka.log.LogManager$$anonfun$createAndValidateLogDirs$2.apply(LogManager.scala:77)
      at kafka.log.LogManager$$anonfun$createAndValidateLogDirs$2.apply(LogManager.scala:72)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
      at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32)
      at kafka.log.LogManager.createAndValidateLogDirs(LogManager.scala:72)
      at kafka.log.LogManager.<init>(LogManager.scala:60)
      at kafka.server.KafkaServer.startup(KafkaServer.scala:59)
      at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)

        Activity

        Hide
        Guozhang Wang added a comment -

        We encountered a general deaklock issue with System.exit(), where a startup / shutdown aware container wrapping KafkaServerStartable that has shutdown hook executed by another thread is blocked waiting for the main thread in startup() to set the startup_finished flag while the main thread is blocked on waiting for the shutdown hook thread to join.

        So I would like to propose a bit different solution:

        1. Replace System.exit(1) in startup with the shutdown() call.
        2. In KafkaServer.shutdown, move

                brokerState.newState(NotRunning)
                shutdownLatch.countDown()
                startupComplete.set(false)
        

        to the final block.

        3. Remove the System.exit(1) in shutdown() call.

        This will resolve this issue but as well works with a startup / shutdown aware container that has a shutdown hook depending on the startup logic to be complete.

        Show
        Guozhang Wang added a comment - We encountered a general deaklock issue with System.exit(), where a startup / shutdown aware container wrapping KafkaServerStartable that has shutdown hook executed by another thread is blocked waiting for the main thread in startup() to set the startup_finished flag while the main thread is blocked on waiting for the shutdown hook thread to join. So I would like to propose a bit different solution: 1. Replace System.exit(1) in startup with the shutdown() call. 2. In KafkaServer.shutdown, move brokerState.newState(NotRunning) shutdownLatch.countDown() startupComplete.set( false ) to the final block. 3. Remove the System.exit(1) in shutdown() call. This will resolve this issue but as well works with a startup / shutdown aware container that has a shutdown hook depending on the startup logic to be complete.
        Hide
        Swapnil Ghike added a comment -

        Filed KAFKA-770 with a patch.

        Show
        Swapnil Ghike added a comment - Filed KAFKA-770 with a patch.
        Hide
        Jun Rao added a comment -

        Thanks for the review. Committed to 0.8.

        Swapnil,

        I agree. Could you file a separate jira to track that?

        Show
        Jun Rao added a comment - Thanks for the review. Committed to 0.8. Swapnil, I agree. Could you file a separate jira to track that?
        Hide
        Neha Narkhede added a comment -

        +1

        Show
        Neha Narkhede added a comment - +1
        Hide
        Swapnil Ghike added a comment -

        Also to maintain consistency with Producer/ConsumerConfigs, perhaps we should verify KafkaConfig properties in the secondary constructor instead of having a separate verify method.

        Show
        Swapnil Ghike added a comment - Also to maintain consistency with Producer/ConsumerConfigs, perhaps we should verify KafkaConfig properties in the secondary constructor instead of having a separate verify method.
        Hide
        Swapnil Ghike added a comment -

        +1

        Show
        Swapnil Ghike added a comment - +1
        Hide
        Sriram Subramanian added a comment -

        +1

        Show
        Sriram Subramanian added a comment - +1
        Hide
        Jun Rao added a comment -

        Attach a patch.

        Show
        Jun Rao added a comment - Attach a patch.

          People

          • Assignee:
            Jun Rao
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development