Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1740

insert_overwrite_table and insert_overwrite first replacecommit has empty partitionToReplaceFileIds

Details

    Description

      insert_overwrite_table and insert_overwrite first replacecommit has empty partitionToReplaceFileIds which messes up archival code. 

      Fix

      The code needs to only proceed if partitionToReplaceFileIds is not Empty.

       

      Updates: Archival also breaks upon requested/inflight commit in 0.9.0-SNAPSHOT. It wasn't an issue in 0.7.0 so this Jira ticket is fixing two things. Please refer to the detailed description in the PR.

      Attachments

        Activity

          susudong Susu Dong added a comment -

          jagmeet.bali  satish

          The fix for checking whether partitionToReplaceFields works on the release 0.7.0, however, it wouldn't work within the latest 0.9.0-SNAPSHOT. I believe this has something to do with the new clustering feature.

          Particularly, there are new exceptions being thrown here:

          java.util.NoSuchElementException: No value present in Option
          	at org.apache.hudi.common.util.Option.get(Option.java:88)
          	at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:77)
          	at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370)
          	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
          	at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
          	at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
          	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
          	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
          	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:483)
          	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:225)
          	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:161)
          	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
          	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
          	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
          	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
          	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
          	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
          	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
          	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
          	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
          	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
          	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
          	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
          	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
          	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
          	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
          	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
          	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
          	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
          	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
          	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
          	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:50)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:54)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:56)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:58)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:60)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:62)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:64)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:66)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:68)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:70)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:72)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:74)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:76)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:78)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:80)
          	at $line34.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:82)
          	at $line34.$read$$iw$$iw$$iw$$iw.<init>(<console>:84)
          	at $line34.$read$$iw$$iw$$iw.<init>(<console>:86)
          	at $line34.$read$$iw$$iw.<init>(<console>:88)
          	at $line34.$read$$iw.<init>(<console>:90)
          	at $line34.$read.<init>(<console>:92)
          	at $line34.$read$.<init>(<console>:96)
          	at $line34.$read$.<clinit>(<console>)
          	at $line34.$eval$.$print$lzycompute(<console>:7)
          	at $line34.$eval$.$print(<console>:6)
          	at $line34.$eval.$print(<console>)
          	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
          	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
          	at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
          	at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
          	at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
          	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
          	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
          	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
          	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
          	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
          	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
          	at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
          	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
          	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
          	at org.apache.spark.repl.Main$.doMain(Main.scala:78)
          	at org.apache.spark.repl.Main$.main(Main.scala:58)
          	at org.apache.spark.repl.Main.main(Main.scala)
          	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
          	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
          	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
          	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
          	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
          	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
          	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
          	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
          

          The fix might have a bigger impact and I am looking into it now.

          susudong Susu Dong added a comment - jagmeet.bali   satish The fix for checking whether partitionToReplaceFields works on the release 0.7.0, however, it wouldn't work within the latest 0.9.0-SNAPSHOT. I believe this has something to do with the new clustering feature. Particularly, there are new exceptions being thrown here: java.util.NoSuchElementException: No value present in Option at org.apache.hudi.common.util.Option.get(Option.java:88) at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:77) at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370) at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311) at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128) at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430) at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:483) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:225) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:161) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:50) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:54) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:56) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:58) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:60) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:62) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:64) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:66) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:68) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:70) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:72) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:74) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:76) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:78) at $line34.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:80) at $line34.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:82) at $line34.$read$$iw$$iw$$iw$$iw.<init>(<console>:84) at $line34.$read$$iw$$iw$$iw.<init>(<console>:86) at $line34.$read$$iw$$iw.<init>(<console>:88) at $line34.$read$$iw.<init>(<console>:90) at $line34.$read.<init>(<console>:92) at $line34.$read$.<init>(<console>:96) at $line34.$read$.<clinit>(<console>) at $line34.$eval$.$print$lzycompute(<console>:7) at $line34.$eval$.$print(<console>:6) at $line34.$eval.$print(<console>) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464) at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239) at org.apache.spark.repl.Main$.doMain(Main.scala:78) at org.apache.spark.repl.Main$.main(Main.scala:58) at org.apache.spark.repl.Main.main(Main.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) The fix might have a bigger impact and I am looking into it now.

          People

            susudong Susu Dong
            jagmeet.bali Jagmeet Bali
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: