Details
-
Task
-
Status: Closed
-
Critical
-
Resolution: Not A Problem
-
None
-
None
-
None
-
2
Description
From community:
Hi guys, I'm trying to implement this architecture with hudi
db table — Debezium --> kafka ---Hudi sink connector --> S3 bucket
My setting
Kafka version 2.4
Hudi version 0.10.1
Hdf sink connector version 10.1.4
I'm encountering this error
ERROR WorkerSinkTask{id=<XXX>} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask) java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80) at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:312) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 9 more
this is the Dockerfile I used to bake the custom image
#================== FROM maven:3.8.4-openjdk-8-slim as build-hudi ENV HUDI_VERSION=0.10.1 RUN mkdir /home/hudi && \ curl -L https://github.com/apache/hudi/archive/refs/tags/release-$HUDI_VERSION.tar.gz \ > hudi-release-$HUDI_VERSION.tar.gz && \ tar -xzvf ./hudi-release-$HUDI_VERSION.tar.gz -C /home/hudi && \ rm ./hudi-release-$HUDI_VERSION.tar.gz && \ cd /home/hudi/hudi-release-$HUDI_VERSION && \ mvn package -DskipTests -pl packaging/hudi-kafka-connect-bundle -am #================== FROM confluentinc/cp-kafka-connect:7.0.1 ENV DEBEZIUM_VERSION=1.4.1.Final \ MAVEN_REPO_CORE="https://repo1.maven.org/maven2" \ CONNECTOR=mysql \ KAFKA_CONNECT_PLUGINS_DIR=/usr/share/java \ DATAGEN_VERSION=0.5.3 \ ADX_SINK_CONNECTOR_VERSION=2.2.0 \ AMAZON_S3_SINK_CONNECTOR_VERSION=10.0.3 \ HDFS2_SINK_CONNECTOR_VERSION=10.1.4 \ HUDI_OUTPUT_JAR_FILE="hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar" \ HUDI_VERSION=0.10.1 RUN curl -fSL -o /tmp/plugin.tar.gz \ $MAVEN_REPO_CORE/io/debezium/debezium-connector-$CONNECTOR/$DEBEZIUM_VERSION/debezium-connector-$CONNECTOR-$DEBEZIUM_VERSION-plugin.tar.gz && \ tar -xzf /tmp/plugin.tar.gz -C $KAFKA_CONNECT_PLUGINS_DIR && \ rm -f /tmp/plugin.tar.gz RUN confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:$DATAGEN_VERSION && \ confluent-hub install --no-prompt microsoftcorporation/kafka-sink-azure-kusto:$ADX_SINK_CONNECTOR_VERSION && \ confluent-hub install --no-prompt confluentinc/kafka-connect-s3:$AMAZON_S3_SINK_CONNECTOR_VERSION && \ confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:$HDFS2_SINK_CONNECTOR_VERSION COPY --from=build-hudi /home/hudi/hudi-release-$HUDI_VERSION/packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-$HUDI_VERSION.jar $KAFKA_CONNECT_PLUGINS_DIR/$HUDI_OUTPUT_JAR_FILE