Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
When using the HoodieSnapshotExporter to export a Hudi dataset on S3 to a different bucket, i.e., the source-base-path and the target-output-path are in different buckets, IllegalArgumentException is thrown:
./bin/spark-submit \ --master yarn \ --deploy-mode client \ --driver-memory 10g \ --executor-memory 10g \ --num-executors 1 \ --executor-cores 4 \ --jars /home/hadoop/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.kryoserializer.buffer=256m \ --conf spark.kryoserializer.buffer.max=1024m \ --conf spark.rdd.compress=true \ --conf spark.memory.storageFraction=0.8 \ --conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC" \ --conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC" \ --conf spark.ui.proxyBase="" \ --conf 'spark.eventLog.enabled=true' --conf 'spark.eventLog.dir=hdfs:///var/log/spark/apps' \ --conf spark.hadoop.yarn.timeline-service.enabled=false \ --conf spark.driver.userClassPathFirst=true \ --conf spark.executor.userClassPathFirst=true \ --conf "spark.sql.hive.convertMetastoreParquet=false" \ --conf spark.sql.catalogImplementation=in-memory \ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \ --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \ --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \ /home/hadoop/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \ --source-base-path "s3a://ethan-lakehouse-us-east-2/hudi/hudi_trips_cow/" \ --target-output-path "s3a://ethan-tmp/backup/" \ --output-format "hudi"
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS s3a://ethan-tmp//backup -expected s3a://ethan-lakehouse-us-east-2 at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224) at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1155) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:666) at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1117) at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1143) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3078) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263) at org.apache.hudi.utilities.HoodieSnapshotExporter.outputPathExists(HoodieSnapshotExporter.java:145) at org.apache.hudi.utilities.HoodieSnapshotExporter.export(HoodieSnapshotExporter.java:120) at org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:275) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Attachments
Issue Links
- links to