Kafka
  1. Kafka
  2. KAFKA-748

Append to index fails due to invalid offset

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      We seem to be appending to the index and it checks to make sure we do not insert an entry with an offset that is larger than the actual offset of the message. We seem to be trying to insert an offset = 1 in the index while lastOffset is 24463. This seems to get fixed on restarting the broker.

      java.lang.IllegalArgumentException: Attempt to append an offset (1) to position 21703 no larger than the last offset appended (24463).
      at kafka.log.OffsetIndex.append(OffsetIndex.scala:183)
      at kafka.log.LogSegment.append(LogSegment.scala:60)
      at kafka.log.Log.append(Log.scala:286)
      at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:188)
      at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:181)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
      at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)

      1. outmsg
        3.00 MB
        Sriram Subramanian
      2. outindex
        699 kB
        Sriram Subramanian
      3. KAFKA-748-v1.patch
        6 kB
        Jay Kreps

        Activity

        Neha Narkhede made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Jay Kreps made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Jay Kreps added a comment -

        Checked in.

        Jun, I think we effectively do the check you describe already because we run translateOffset on the last index entry in order to initialize the log end offset.

        I am so sure that this can cause the negative position issue.

        Show
        Jay Kreps added a comment - Checked in. Jun, I think we effectively do the check you describe already because we run translateOffset on the last index entry in order to initialize the log end offset. I am so sure that this can cause the negative position issue.
        Hide
        Jun Rao added a comment -

        Thanks for the patch. It looks good. So +1. I suggest that we add one more sanity check after loading all logs during startup: make sure that the position pointed to by the last index entry is less than FileMessageSet.sizeInBytes(). We can probably just do the check for the active log segment.

        Also, do you think this can cause KAFKA-753 too?

        Show
        Jun Rao added a comment - Thanks for the patch. It looks good. So +1. I suggest that we add one more sanity check after loading all logs during startup: make sure that the position pointed to by the last index entry is less than FileMessageSet.sizeInBytes(). We can probably just do the check for the active log segment. Also, do you think this can cause KAFKA-753 too?
        Jay Kreps made changes -
        Attachment KAFKA-748-v1.patch [ 12568498 ]
        Hide
        Jay Kreps added a comment -

        Attached is a patch that retains the index position on resize. It also includes a few useful logging and assert statements.

        Show
        Jay Kreps added a comment - Attached is a patch that retains the index position on resize. It also includes a few useful logging and assert statements.
        Hide
        Jay Kreps added a comment -

        Okay the problem here is the resize method. We generalized index.trimInvalid to index.resize to be able to enlarge the index when we load a segment. This was done to avoid rolling the log on a full index, I think, which had other problems with empty indexes. However it looks like this never actually was tried, because doing this resets the position in the mmap to 0, so we start overwriting the entries in the index from the beginning but expanding what we think the valid segment of the memory map is. When we close we then have a bunch of zeros at the end of the index and have overwritten the front of the file.

        Show
        Jay Kreps added a comment - Okay the problem here is the resize method. We generalized index.trimInvalid to index.resize to be able to enlarge the index when we load a segment. This was done to avoid rolling the log on a full index, I think, which had other problems with empty indexes. However it looks like this never actually was tried, because doing this resets the position in the mmap to 0, so we start overwriting the entries in the index from the beginning but expanding what we think the valid segment of the memory map is. When we close we then have a bunch of zeros at the end of the index and have overwritten the front of the file.
        Hide
        Jay Kreps added a comment -

        This is actually very easy to reproduce. Just do a clean shutdown on the broker while running the producer perf test. Haven't debugged it yet.

        Show
        Jay Kreps added a comment - This is actually very easy to reproduce. Just do a clean shutdown on the broker while running the producer perf test. Haven't debugged it yet.
        Sriram Subramanian made changes -
        Assignee Sriram Subramanian [ sriramsub ] Jay Kreps [ jkreps ]
        Sriram Subramanian made changes -
        Attachment outindex [ 12567905 ]
        Attachment outmsg [ 12567906 ]
        Hide
        Sriram Subramanian added a comment -

        The last few lines in the index file are below. Looks like the offsets are not monotonically increasing. Note that these are the last lines that the DumpLogSegment tool could dump since it fails due to the mismatch. This still does not explain why the append is happening to offset 1. Adding some logging on startup, append and truncate code paths to ensure sanity and shutting down if there is inconsistency should help to get more info.

        offset: 24878 position: 278006637
        offset: 24879 position: 278012656
        offset: 24880 position: 278050159
        offset: 24313 position: 249363783

        Show
        Sriram Subramanian added a comment - The last few lines in the index file are below. Looks like the offsets are not monotonically increasing. Note that these are the last lines that the DumpLogSegment tool could dump since it fails due to the mismatch. This still does not explain why the append is happening to offset 1. Adding some logging on startup, append and truncate code paths to ensure sanity and shutting down if there is inconsistency should help to get more info. offset: 24878 position: 278006637 offset: 24879 position: 278012656 offset: 24880 position: 278050159 offset: 24313 position: 249363783
        Neha Narkhede made changes -
        Labels p1
        Neha Narkhede made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Blocker [ 1 ]
        Sriram Subramanian created issue -

          People

          • Assignee:
            Jay Kreps
            Reporter:
            Sriram Subramanian
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development