Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5797

bulk insert as row will throw error without mdt init

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • spark

    Description

      `bulkinsert as row` not initTable first, it will trigger mdt init when commit result after write in the same job, and this init will use fileSystem to init, which will contain orphan file or error file. For example, if writer not flush but kill by RM, the parquet file size may be 0, will triiger the following questions when init mdt.

       

      Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most recent failure: Lost task 1.3 in stage 13.0 (TID 102100) (bigdata-nmg-hdp10339.nmg01.diditaxi.com executor 832): java.lang.IllegalStateException
      	at org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:53)
      	at org.apache.hudi.metadata.HoodieMetadataPayload.lambda$null$4(HoodieMetadataPayload.java:328)
      	at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
      	at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
      	at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1683)
      	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
      	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
      	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
      	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
      	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
      	at org.apache.hudi.metadata.HoodieMetadataPayload.lambda$createPartitionFilesRecord$5(HoodieMetadataPayload.java:323)
      	at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
      	at org.apache.hudi.metadata.HoodieMetadataPayload.createPartitionFilesRecord(HoodieMetadataPayload.java:321)
      	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.lambda$getFilesPartitionRecords$f70c2081$1(HoodieBackedTableMetadataWriter.java:1105)
      	at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
      	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
      	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1892)
      	at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1249)
      	at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1249)
      	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2261)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
      	at org.apache.spark.scheduler.Task.run(Task.scala:131)
      	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
      	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1463)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

       

       

      Attachments

        Issue Links

          Activity

            People

              KnightChess KnightChess
              KnightChess KnightChess
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: