Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4913

HoodieSnapshotExporter throws IllegalArgumentException: Wrong FS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.12.1
    • None

    Description

      When using the HoodieSnapshotExporter to export a Hudi dataset on S3 to a different bucket, i.e., the source-base-path and the target-output-path are in different buckets, IllegalArgumentException is thrown:

       

      ./bin/spark-submit \
        --master yarn \
        --deploy-mode client \
        --driver-memory 10g \
        --executor-memory 10g \
        --num-executors 1 \
        --executor-cores 4 \
        --jars /home/hadoop/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar \
        --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
        --conf spark.kryoserializer.buffer=256m \
        --conf spark.kryoserializer.buffer.max=1024m \
        --conf spark.rdd.compress=true \
        --conf spark.memory.storageFraction=0.8 \
        --conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC" \
        --conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC" \
        --conf spark.ui.proxyBase="" \
        --conf 'spark.eventLog.enabled=true' --conf 'spark.eventLog.dir=hdfs:///var/log/spark/apps' \
        --conf spark.hadoop.yarn.timeline-service.enabled=false \
        --conf spark.driver.userClassPathFirst=true \
        --conf spark.executor.userClassPathFirst=true \
        --conf "spark.sql.hive.convertMetastoreParquet=false" \
        --conf spark.sql.catalogImplementation=in-memory \
        --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
        --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
        --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
            /home/hadoop/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \
        --source-base-path "s3a://ethan-lakehouse-us-east-2/hudi/hudi_trips_cow/" \
        --target-output-path "s3a://ethan-tmp/backup/" \
        --output-format "hudi"

        

      Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS s3a://ethan-tmp//backup -expected s3a://ethan-lakehouse-us-east-2
          at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1155)
          at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:666)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1117)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1143)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3078)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263)
          at org.apache.hudi.utilities.HoodieSnapshotExporter.outputPathExists(HoodieSnapshotExporter.java:145)
          at org.apache.hudi.utilities.HoodieSnapshotExporter.export(HoodieSnapshotExporter.java:120)
          at org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:275)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
          at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
          at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
          at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
          at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
          at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
          at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
          at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              guoyihua Ethan Guo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: