Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5431

LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.2.1
    • Fix Version/s: 0.11.0.1, 1.0.0
    • Component/s: core
    • Labels:

      Description

      Hey all,
      i have a strange problem with our uat cluster of 3 kafka brokers.

      the __consumer_offsets topic was replicated to two instances and our disks ran full due to a wrong configuration of the log cleaner. We fixed the configuration and updated from 0.10.1.1 to 0.10.2.1 .

      Today i increased the replication of the __consumer_offsets topic to 3 and triggered replication to the third cluster via kafka-reassign-partitions.sh.

      That went well but i get many errors like

      [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for partition [__consumer_offsets,18] offset 0 error Record size is less than the minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
      [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for partition [__consumer_offsets,24] offset 0 error Record size is less than the minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
      

      Which i think are due to the full disk event.

      The log cleaner threads died on these wrong messages:

      [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
      org.apache.kafka.common.errors.CorruptRecordException: Record size is less than the minimum record overhead (14)
      [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
      

      Looking at the file is see that some are truncated and some are jsut empty:
      $ ls -lsh 00000000000000594653.log
      0 rw-rr- 1 user user 100M Jun 12 11:00 00000000000000594653.log

      Sadly i do not have the logs any more from the disk full event itsself.

      I have three questions:

      • What is the best way to clean this up? Deleting the old log files and restarting the brokers?
      • Why did kafka not handle the disk full event well? Is this only affecting the cleanup or may we also loose data?
      • Is this maybe caused by the combination of upgrade and disk full?

      And last but not least: Keep up the good work. Kafka is really performing well while being easy to administer and has good documentation!

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                huxi_2b huxihx
                Reporter:
                crietz Carsten Rietz
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: