Kafka
  1. Kafka
  2. KAFKA-750

inconsistent index offset during broker startup

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Duplicate
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: log
    • Labels:

      Description

      Saw the following log during a clean restart of a broker.

      2013/01/29 19:18:12.073 INFO [FileMessageSet] [main] [kafka] [] Creating or reloading log segment /export/content/kafka/i001_caches/topic1-3/00000000000000000000.log2013/01/29 19:18:12.074 INFO [OffsetIndex] [main] [kafka] [] Created index file /export/content/kafka/i001_caches/topic1-3/00000000000000000000.index with maxEntries = 65
      5360, maxIndexSize = 10485760, entries = 655360, lastOffset = 0

      A couple of things are weird.
      1. There are entries in the index, but lastOffset is 0.
      2 maxIndexSize/manxEntries = 16, instead of 8.

        Activity

        Hide
        Jay Kreps added a comment -

        Are you sure this is a clean shutdown? Do we have the log from the broker? Without that it is pretty hard to figure out.

        For issue (1), the lastOffset == 0:

        Seeing last offset = 0 should happen in the even of an unclean shutdown. It is not possible to append an actual 0 to the index, we guard against that. If the segment was from an unclean shutdown this is expected and gets repaired when we run recovery on the segment.

        I checked the logic for clean shutdown. The only time we create a clean shutdown file is logmanager.shutdown:
        def shutdown() {
        debug("Shutting down.")
        try

        { // close the logs allLogs.foreach(_.close()) // mark that the shutdown was clean by creating the clean shutdown marker file logDirs.foreach(dir => Utils.swallow(new File(dir, CleanShutdownFile).createNewFile())) }

        finally

        { // regardless of whether the close succeeded, we need to unlock the data directories dirLocks.foreach(_.destroy()) }

        debug("Shutdown complete.")
        }
        I don't think this can fail to call close on the index (which truncates the file) and still write an index file. The close() call is indeed truncating the index.

        There is an issue here which is that our resize() call does not call flush after truncating the file. This means that a hard OS crash after a clean shutdown could lead to a corrupt index on disk (the truncated file bits could re-appear) but also a clean shutdown file. This is a fairly unlikely problem, though, as it requires a hard OS crash to coincide with a clean shutdown. I don't think that happened here.

        That leaves the possibility that the size is somehow getting out of whack with the position in the index. This can be modified in truncateTo or append, and both seem to correctly manage the size and position.

        The second issue of maxEntries vs maxIndexSize is even more curious. The maxIndexSize is a configuration parameter so it takes whatever is configured at startup time. The maxEntries is set to file_size/8. So if the file was newly allocated this would be impossible because it would definitely be set exactly. In the event of a clean shutdown any value is possible. The weird thing here is that this value happens to be exactly half the maxEntries value. This would be a remarkable coincidence. I cannot explain this.

        Show
        Jay Kreps added a comment - Are you sure this is a clean shutdown? Do we have the log from the broker? Without that it is pretty hard to figure out. For issue (1), the lastOffset == 0: Seeing last offset = 0 should happen in the even of an unclean shutdown. It is not possible to append an actual 0 to the index, we guard against that. If the segment was from an unclean shutdown this is expected and gets repaired when we run recovery on the segment. I checked the logic for clean shutdown. The only time we create a clean shutdown file is logmanager.shutdown: def shutdown() { debug("Shutting down.") try { // close the logs allLogs.foreach(_.close()) // mark that the shutdown was clean by creating the clean shutdown marker file logDirs.foreach(dir => Utils.swallow(new File(dir, CleanShutdownFile).createNewFile())) } finally { // regardless of whether the close succeeded, we need to unlock the data directories dirLocks.foreach(_.destroy()) } debug("Shutdown complete.") } I don't think this can fail to call close on the index (which truncates the file) and still write an index file. The close() call is indeed truncating the index. There is an issue here which is that our resize() call does not call flush after truncating the file. This means that a hard OS crash after a clean shutdown could lead to a corrupt index on disk (the truncated file bits could re-appear) but also a clean shutdown file. This is a fairly unlikely problem, though, as it requires a hard OS crash to coincide with a clean shutdown. I don't think that happened here. That leaves the possibility that the size is somehow getting out of whack with the position in the index. This can be modified in truncateTo or append, and both seem to correctly manage the size and position. The second issue of maxEntries vs maxIndexSize is even more curious. The maxIndexSize is a configuration parameter so it takes whatever is configured at startup time. The maxEntries is set to file_size/8. So if the file was newly allocated this would be impossible because it would definitely be set exactly. In the event of a clean shutdown any value is possible. The weird thing here is that this value happens to be exactly half the maxEntries value. This would be a remarkable coincidence. I cannot explain this.
        Hide
        Jay Kreps added a comment -

        Item number 1 could actually be caused by KAFKA-748, I think.

        Show
        Jay Kreps added a comment - Item number 1 could actually be caused by KAFKA-748 , I think.
        Hide
        Jun Rao added a comment -

        Likely fixed in kafka-748 and kafka-757.

        Show
        Jun Rao added a comment - Likely fixed in kafka-748 and kafka-757.

          People

          • Assignee:
            Jay Kreps
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development