Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2658

When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • None

    Description

      Exception mentioned blow will throw even though disable auto clean.

      21/10/18 05:54:20,149 ERROR Misc: Streaming batch fail, shutting down whole application immediately.21/10/18 05:54:20,149 ERROR Misc: Streaming batch fail, shutting down whole application immediately.java.lang.IllegalArgumentException: Increase hoodie.keep.min.commits=3 to be greater than hoodie.cleaner.commits.retained=10. Otherwise, there is risk of incremental pull missing data from few instants. at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) at org.apache.hudi.config.HoodieCompactionConfig$Builder.build(HoodieCompactionConfig.java:355) at org.apache.hudi.config.HoodieWriteConfig$Builder.setDefaults(HoodieWriteConfig.java:1396) at org.apache.hudi.config.HoodieWriteConfig$Builder.build(HoodieWriteConfig.java:1436) at org.apache.hudi.DataSourceUtils.createHoodieConfig(DataSourceUtils.java:188) at org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:193) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$3.apply(HoodieSparkSqlWriter.scala:166) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$3.apply(HoodieSparkSqlWriter.scala:166) at scala.Option.getOrElse(Option.scala:121) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:166) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at tv.freewheel.reporting.ssql.sinkers.HudiSinker.sink(HudiSinker.scala:20) at tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1$$anonfun$apply$1.apply$mcV$sp(RuleScheduler.scala:73) at tv.freewheel.reporting.realtime.utils.Misc$.failFast(Misc.scala:72) at tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1.apply(RuleScheduler.scala:73) at tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1.apply(RuleScheduler.scala:71) at scala.Option.foreach(Option.scala:257) at tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler.execSink(RuleScheduler.scala:71) at tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$submitRecursively$3$$anonfun$1.apply$mcV$sp(RuleScheduler.scala:35) at tv.freewheel.reporting.realtime.utils.Misc$$anon$2.run(Misc.scala:31) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhangyue19921010 Yue Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: