Details
Description
I have a Kafka Connect cluster with three workers running on Kubernetes. The workers communicate with each other using pod's IP (internal IP 192.X.X.X). Sometimes, pods are redistributed to different node. I am not sure if it has anything to do with the issue, but I think it makes pod's IP to be changed and Kafka Connect needs to rebalance.
Occasionally, tasks fail due to NPE.
From the connectors/:connector/status REST API, I can see this trace:
at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:517) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1258) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1700(DistributedHerder.java:127) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1273) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1269) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
It looks like the issue is similar to KAFKA-10323 and
It seems NPE is thrown from here.
Attachments
Issue Links
- duplicates
-
KAFKA-10323 NullPointerException during rebalance
- Open