Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3610

Validate Hudi Kafka Connect Sink writing to S3

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Critical
    • Resolution: Not A Problem
    • None
    • None
    • kafka-connect
    • None
    • 2

    Description

      From community:
      Hi guys, I'm trying to implement this architecture with hudi
      db table — Debezium --> kafka ---Hudi sink connector --> S3 bucket
      My setting
      Kafka version 2.4
      Hudi version 0.10.1
      Hdf sink connector version 10.1.4
      I'm encountering this error

      ERROR WorkerSinkTask{id=<XXX>} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
      java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
      at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80)
      at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:312)
      at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
      at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
      at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
      at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
      at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
      at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
      ... 9 more 

      this is the Dockerfile I used to  bake the custom image 

      #==================
      
      FROM maven:3.8.4-openjdk-8-slim as build-hudi
      ENV HUDI_VERSION=0.10.1
      
      RUN mkdir /home/hudi && \
          curl -L https://github.com/apache/hudi/archive/refs/tags/release-$HUDI_VERSION.tar.gz \
          > hudi-release-$HUDI_VERSION.tar.gz && \
          tar -xzvf ./hudi-release-$HUDI_VERSION.tar.gz -C /home/hudi && \
          rm ./hudi-release-$HUDI_VERSION.tar.gz && \
          cd /home/hudi/hudi-release-$HUDI_VERSION && \
          mvn package -DskipTests -pl packaging/hudi-kafka-connect-bundle -am
      #==================
      
      FROM confluentinc/cp-kafka-connect:7.0.1
      
      ENV DEBEZIUM_VERSION=1.4.1.Final \
          MAVEN_REPO_CORE="https://repo1.maven.org/maven2" \
          CONNECTOR=mysql \
          KAFKA_CONNECT_PLUGINS_DIR=/usr/share/java \
          DATAGEN_VERSION=0.5.3 \
          ADX_SINK_CONNECTOR_VERSION=2.2.0 \
          AMAZON_S3_SINK_CONNECTOR_VERSION=10.0.3 \
          HDFS2_SINK_CONNECTOR_VERSION=10.1.4 \
          HUDI_OUTPUT_JAR_FILE="hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar" \
          HUDI_VERSION=0.10.1
      
      
      
      RUN curl -fSL -o /tmp/plugin.tar.gz \
        $MAVEN_REPO_CORE/io/debezium/debezium-connector-$CONNECTOR/$DEBEZIUM_VERSION/debezium-connector-$CONNECTOR-$DEBEZIUM_VERSION-plugin.tar.gz && \
        tar -xzf /tmp/plugin.tar.gz -C $KAFKA_CONNECT_PLUGINS_DIR && \
        rm -f /tmp/plugin.tar.gz
      
      RUN confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:$DATAGEN_VERSION && \
          confluent-hub install --no-prompt microsoftcorporation/kafka-sink-azure-kusto:$ADX_SINK_CONNECTOR_VERSION && \
          confluent-hub install --no-prompt confluentinc/kafka-connect-s3:$AMAZON_S3_SINK_CONNECTOR_VERSION && \
          confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:$HDFS2_SINK_CONNECTOR_VERSION
      
      
      
      COPY --from=build-hudi /home/hudi/hudi-release-$HUDI_VERSION/packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-$HUDI_VERSION.jar $KAFKA_CONNECT_PLUGINS_DIR/$HUDI_OUTPUT_JAR_FILE 

      Attachments

        Activity

          People

            rmahindra Rajesh Mahindra
            guoyihua Ethan Guo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: