Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.1.0
-
None
-
None
Description
When rolling a log segment one of our Kafka cluster got an immediate read error on the same partition. This lead to a flood of log messages containing the corresponding stacktraces. Data was still appended to the partition but consumers were unable to read from that partition. Reason for the exception is unclear.
[2018-07-02 23:53:32,732] INFO [Log partition=ingestion-3, dir=/var/vcap/store/kafka] Rolled new log segment at offset 971865991 in 1 ms. (kafka.log.Log) [2018-07-02 23:53:32,739] INFO [ProducerStateManager partition=ingestion-3] Writing producer snapshot at offset 971865991 (kafka.log.ProducerStateManager) [2018-07-02 23:53:32,739] INFO [Log partition=ingestion-3, dir=/var/vcap/store/kafka] Rolled new log segment at offset 971865991 in 1 ms. (kafka.log.Log) [2018-07-02 23:53:32,750] ERROR [ReplicaManager broker=1] Error processing fetch operation on partition ingestion-3, offset 971865977 (kafka.server.ReplicaManager) Caused by: java.io.EOFException: Failed to read `log header` from file channel `sun.nio.ch.FileChannelImpl@2e0e8810`. Expected to read 17 bytes, but reached end of file after reading 0 bytes. Started read from position 2147483643.
We mitigated the issue by stopping the affected node and deleting the corresponding directory. Once the partition was recreated for the replica (we use replication-factor 2) the other replica experienced the same problem. We mitigated likewise.
To us it is unclear, what caused this issue. Can you help us in finding the root cause of this problem?
Attachments
Attachments
Issue Links
- duplicates
-
KAFKA-6292 KafkaConsumer ran into Unknown error fetching data for topic-partition caused by integer overflow in FileLogInputStream
- Resolved