Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
If run a spark job to insert/insertOverWrite empty data into HUDI table using bulkInsert, the exceptions like 'java.lang.IllegalArgumentException: Positive number of partitions required' would be thrown out.
If using bulk insert overwrite, the exception stack is like
Positive number of partitions required java.lang.IllegalArgumentException: Positive number of partitions required at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:118) at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:96) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.RDD.$anonfun$distinct$11(RDD.scala:470) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.distinct(RDD.scala:470) at org.apache.spark.api.java.JavaRDD.distinct(JavaRDD.scala:85) at org.apache.hudi.data.HoodieJavaRDD.distinct(HoodieJavaRDD.java:157) at org.apache.hudi.commit.DatasetBulkInsertOverwriteCommitActionExecutor.getPartitionToReplacedFileIds(DatasetBulkInsertOverwriteCommitActionExecutor.java:61) at org.apache.hudi.commit.BaseDatasetBulkInsertCommitActionExecutor.lambda$buildHoodieWriteMetadata$0(BaseDatasetBulkInsertCommitActionExecutor.java:83) at org.apache.hudi.common.util.Option.map(Option.java:108) at org.apache.hudi.commit.BaseDatasetBulkInsertCommitActionExecutor.buildHoodieWriteMetadata(BaseDatasetBulkInsertCommitActionExecutor.java:80) at org.apache.hudi.commit.BaseDatasetBulkInsertCommitActionExecutor.execute(BaseDatasetBulkInsertCommitActionExecutor.java:102) at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:908) at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:407) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61)
if using bulk insert,
Caused by: org.apache.hudi.exception.HoodieException: Failed to update metadata at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:92) at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.commit(HoodieDataSourceInternalBatchWrite.java:92) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:369) ... 138 more Caused by: org.apache.hudi.exception.HoodieException: Failed to update metadata at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:367) at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:285) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:211) at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:89) ... 140 more Caused by: java.lang.IllegalArgumentException: Positive number of partitions required at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:118) at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:96) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.SparkContext.$anonfun$union$2(SparkContext.scala:1410) at org.apache.spark.SparkContext.$anonfun$union$2$adapted(SparkContext.scala:1410) at scala.collection.TraversableLike.noneIn$1(TraversableLike.scala:271) at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:337) at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249) at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at scala.collection.TraversableLike.filter(TraversableLike.scala:347) at scala.collection.TraversableLike.filter$(TraversableLike.scala:347) at scala.collection.AbstractTraversable.filter(Traversable.scala:108) at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1410) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:792) at org.apache.spark.SparkContext.union(SparkContext.scala:1409) at org.apache.spark.SparkContext.$anonfun$union$5(SparkContext.scala:1421) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:792) at org.apache.spark.SparkContext.union(SparkContext.scala:1421) at org.apache.spark.rdd.RDD.$anonfun$union$1(RDD.scala:665) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.union(RDD.scala:665) at org.apache.spark.api.java.JavaRDD.union(JavaRDD.scala:177) at org.apache.hudi.data.HoodieJavaRDD.union(HoodieJavaRDD.java:172) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.lambda$update$21(HoodieBackedTableMetadataWriter.java:918) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:853) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:910) at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:362) ... 144 more
And Similar exception would be thrown out using Spark 2.X.
Attachments
Attachments
Issue Links
- links to