Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2012

Broker should automatically handle corrupt index files

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.1.1
    • Fix Version/s: 0.9.0.0
    • Component/s: None
    • Labels:
      None

      Description

      We had a bunch of unclean system shutdowns (power failure), which caused corruption on our disks holding log segments in many cases. While the broker is handling the log segment corruption properly (truncation), it is having problems with corruption in the index files. Additionally, this only seems to be happening on some startups (while we are upgrading).

      The broker should just do what I do when I hit a corrupt index file - remove it and rebuild it.

      2015/03/09 17:58:53.873 FATAL [KafkaServerStartable] [main] [kafka-server] [] Fatal error during KafkaServerStartable startup. Prepare to shutdown
      java.lang.IllegalArgumentException: requirement failed: Corrupt index found, index file (/export/content/kafka/i001_caches/__consumer_offsets-39/00000000000000000000.index) has non-zero size but the last offset is -2121629628 and the base offset is 0
      at scala.Predef$.require(Predef.scala:233)
      at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
      at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:185)
      at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
      at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
      at kafka.log.Log.loadSegments(Log.scala:184)
      at kafka.log.Log.<init>(Log.scala:82)
      at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:141)
      at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      1. KAFKA-2012_2015-06-19_18:55:11.patch
        4 kB
        Manikumar
      2. KAFKA-2012_2015-06-19_21:09:22.patch
        4 kB
        Manikumar
      3. KAFKA-2012.patch
        3 kB
        Manikumar

        Issue Links

          Activity

          Hide
          omkreddy Manikumar added a comment -

          Created reviewboard https://reviews.apache.org/r/35503/diff/
          against branch origin/trunk

          Show
          omkreddy Manikumar added a comment - Created reviewboard https://reviews.apache.org/r/35503/diff/ against branch origin/trunk
          Hide
          omkreddy Manikumar added a comment -

          Updated reviewboard https://reviews.apache.org/r/35503/diff/
          against branch origin/trunk

          Show
          omkreddy Manikumar added a comment - Updated reviewboard https://reviews.apache.org/r/35503/diff/ against branch origin/trunk
          Hide
          junrao Jun Rao added a comment -

          The latest patch looks good to me. Could you rebase since we just committed KAFKA-1646? Thanks,

          Show
          junrao Jun Rao added a comment - The latest patch looks good to me. Could you rebase since we just committed KAFKA-1646 ? Thanks,
          Hide
          omkreddy Manikumar added a comment -

          Updated reviewboard https://reviews.apache.org/r/35503/diff/
          against branch origin/trunk

          Show
          omkreddy Manikumar added a comment - Updated reviewboard https://reviews.apache.org/r/35503/diff/ against branch origin/trunk
          Hide
          junrao Jun Rao added a comment -

          Thanks for the latest patch. +1 and committed to trunk.

          Show
          junrao Jun Rao added a comment - Thanks for the latest patch. +1 and committed to trunk.
          Hide
          mgharat Mayuresh Gharat added a comment - - edited

          Discussed this with Joel Koshy. This patch seems like a workaround and does not actually tell us why the file got corrupted in first place. We can probably have a config that can turn this code path ON or OFF, so that we can actually investigate when this happens.
          Let me know, I can open another ticket or use this : https://issues.apache.org/jira/browse/KAFKA-1554 to add that config.

          This was discussed in KAFKA-1554 :

          Joel Koshy added a comment - 14/Mar/15 01:10
          That would be a work-around, but ideally we should figure out why it happened in the first place.

          Jun Rao added a comment - 09/Apr/15 02:06
          Yes, I am not sure if auto fixing the index is better. People then may not realize if there is an issue. It would be better to figure out what's causing this.

          Thanks,

          Mayuresh

          Show
          mgharat Mayuresh Gharat added a comment - - edited Discussed this with Joel Koshy . This patch seems like a workaround and does not actually tell us why the file got corrupted in first place. We can probably have a config that can turn this code path ON or OFF, so that we can actually investigate when this happens. Let me know, I can open another ticket or use this : https://issues.apache.org/jira/browse/KAFKA-1554 to add that config. This was discussed in KAFKA-1554 : Joel Koshy added a comment - 14/Mar/15 01:10 That would be a work-around, but ideally we should figure out why it happened in the first place. Jun Rao added a comment - 09/Apr/15 02:06 Yes, I am not sure if auto fixing the index is better. People then may not realize if there is an issue. It would be better to figure out what's causing this. Thanks, Mayuresh
          Hide
          junrao Jun Rao added a comment -

          I am not sure if it's better to add another config. Perhaps, we can just save the corrupted file as Gwen suggested in KAFKA-1554 for trouble shooting.

          Show
          junrao Jun Rao added a comment - I am not sure if it's better to add another config. Perhaps, we can just save the corrupted file as Gwen suggested in KAFKA-1554 for trouble shooting.
          Hide
          omkreddy Manikumar added a comment -

          I am not sure if we can find out the reason after file corruption. i,e What we can infer from a corrupted file?
          instead we should add some defensive code/logs before and after index reads/writes. looks like KAFKA-1554 got some steps to reproduce the issue.

          Show
          omkreddy Manikumar added a comment - I am not sure if we can find out the reason after file corruption. i,e What we can infer from a corrupted file? instead we should add some defensive code/logs before and after index reads/writes. looks like KAFKA-1554 got some steps to reproduce the issue.

            People

            • Assignee:
              omkreddy Manikumar
              Reporter:
              toddpalino Todd Palino
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development