Details
-
Bug
-
Status: Patch Available
-
Minor
-
Resolution: Unresolved
-
3.6.0
-
None
-
None
Description
This reproduces when reading from remote storage with the default fetch.max.wait.ms (500) or lower. isolation.level=read_committed is needed to trigger this.
It's not easy to reproduce on local-only setups, unfortunately, because reads are fast and aren't interrupted.
This error is logged
[2023-11-29 14:01:01,166] ERROR Error occurred while reading the remote data for topic1-0 (kafka.log.remote.RemoteLogReader) org.apache.kafka.common.KafkaException: Failed read position from the transaction index <index_file_name> at org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:235) at org.apache.kafka.storage.internals.log.TransactionIndex.collectAbortedTxns(TransactionIndex.java:171) at kafka.log.remote.RemoteLogManager.collectAbortedTransactions(RemoteLogManager.java:1359) at kafka.log.remote.RemoteLogManager.addAbortedTransactions(RemoteLogManager.java:1341) at kafka.log.remote.RemoteLogManager.read(RemoteLogManager.java:1310) at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:62) at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:31) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.nio.channels.ClosedChannelException at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150) at java.base/sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:325) at org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:233) ... 10 more
and after that this txn index becomes unusable until the process is restarted.
I suspect, it's caused by the reading thread being interrupted due to the fetch timeout. At least this code in AbstractInterruptibleChannel is called.
Fixing may be easy: reopen the channel in TransactionIndex if it's close. However, off the top of my head I can't say if there are some less obvious implications of this change.
Attachments
Issue Links
- is a child of
-
KAFKA-16947 Kafka Tiered Storage V2
- Open
- relates to
-
KAFKA-15772 Flaky test TransactionsWithTieredStoreTest
- Open
- links to