[KAFKA-5431] LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.2.1
Fix Version/s: 0.11.0.1, 1.0.0
Component/s: core
Labels:
- reliability

Description

Hey all,
i have a strange problem with our uat cluster of 3 kafka brokers.

the __consumer_offsets topic was replicated to two instances and our disks ran full due to a wrong configuration of the log cleaner. We fixed the configuration and updated from 0.10.1.1 to 0.10.2.1 .

Today i increased the replication of the __consumer_offsets topic to 3 and triggered replication to the third cluster via kafka-reassign-partitions.sh.

That went well but i get many errors like

[2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for partition [__consumer_offsets,18] offset 0 error Record size is less than the minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
[2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for partition [__consumer_offsets,24] offset 0 error Record size is less than the minimum record overhead (14) (kafka.server.ReplicaFetcherThread)

Which i think are due to the full disk event.

The log cleaner threads died on these wrong messages:

[2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
org.apache.kafka.common.errors.CorruptRecordException: Record size is less than the minimum record overhead (14)
[2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)

Looking at the file is see that some are truncated and some are jsut empty:
$ ls -lsh 00000000000000594653.log
0 ~~rw-r~~r- 1 user user 100M Jun 12 11:00 00000000000000594653.log

Sadly i do not have the logs any more from the disk full event itsself.

I have three questions:

What is the best way to clean this up? Deleting the old log files and restarting the brokers?
Why did kafka not handle the disk full event well? Is this only affecting the cleanup or may we also loose data?
Is this maybe caused by the combination of upgrade and disk full?

And last but not least: Keep up the good work. Kafka is really performing well while being easy to administer and has good documentation!

Attachments

Issue Links

is duplicated by

KAFKA-5582 Log compaction with preallocation enabled does not trim segments

Resolved

links to

GitHub Pull Request #3525

Activity

People

Assignee:: huxihx

Reporter:: Carsten Rietz

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 12/Jun/17 10:25

Updated:: 25/Jul/21 17:33

Resolved:: 21/Jul/17 04:49