Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1758

corrupt recovery file prevents startup

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0.0
    • Component/s: log
    • Labels:

      Description

      Hi,

      We recently had a kafka node go down suddenly. When it came back up, it apparently had a corrupt recovery file, and refused to startup:

      2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up KafkaServer
      java.lang.NumberFormatException: For input string: "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
              at java.lang.Integer.parseInt(Integer.java:481)
              at java.lang.Integer.parseInt(Integer.java:527)
              at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
              at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
              at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
              at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
              at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
              at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
              at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
              at kafka.log.LogManager.loadLogs(LogManager.scala:105)
              at kafka.log.LogManager.<init>(LogManager.scala:57)
              at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
              at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
      

      And the app is under a monitor (so it was repeatedly restarting and failing with this error for several minutes before we got to it)…

      We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it then restarted cleanly (but of course re-synced all it’s data from replicas, so we had no data loss).

      Anyway, I’m wondering if that’s the expected behavior? Or should it not declare it corrupt and then proceed automatically to an unclean restart?

      Should this NumberFormatException be handled a bit more gracefully?

      We saved the corrupt file if it’s worth inspecting (although I doubt it will be useful!)….

      The corrupt files appeared to be all zeroes.

        Attachments

        1. KAFKA-1758_2015-05-09_12:29:20.patch
          1 kB
          Manikumar
        2. KAFKA-1758.patch
          1 kB
          Manikumar

          Activity

            People

            • Assignee:
              omkreddy Manikumar
              Reporter:
              jbrosenberg@gmail.com Jason Rosenberg
              Reviewer:
              Neha Narkhede
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: