Details
Description
After upgrading our connector env to 2.9.0-SNAPSHOT, sometimes the connect cluster encounters following error.
Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:324)
org.apache.kafka.connect.errors.ConnectException: Error while getting end offsets for topic 'connect-storage-topic-connect-cluster-1'
at org.apache.kafka.connect.util.TopicAdmin.endOffsets(TopicAdmin.java:689)
at org.apache.kafka.connect.util.KafkaBasedLog.readToLogEnd(KafkaBasedLog.java:338)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:195)
at org.apache.kafka.connect.storage.KafkaStatusBackingStore.start(KafkaStatusBackingStore.java:216)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:129)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:310)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
at org.apache.kafka.connect.util.TopicAdmin.endOffsets(TopicAdmin.java:668)
... 10 more
https://github.com/apache/kafka/pull/9780 added shared admin to get end offsets. KafkaAdmin#listOffsets does not handle topic-level error, hence the UnknownTopicOrPartitionException on topic-level can obstruct worker from running when the new internal topic is NOT synced to all brokers.
Attachments
Issue Links
- causes
-
KAFKA-12879 Compatibility break in Admin.listOffsets()
- Resolved
- is caused by
-
KAFKA-10021 When reading to the end of the config log, check if fetch.max.wait.ms is greater than worker.sync.timeout.ms
- Resolved
- links to