Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-716

Exception: Not an Avro data file when running HoodieCleanClient.runClean

    XMLWordPrintableJSON

Details

    Description

      Just upgraded to upstream master from 0.5 and seeing an issue at the end of the delta sync run: 

      20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync once. Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync once. Shutting downorg.apache.hudi.exception.HoodieIOException: Not an Avro data file at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) at org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520) at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168) at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: java.io.IOException: Not an Avro data file at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147) at org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) ... 24 more

       

      It is attempting to read an old cleanup file (2 month old) and crashing

       

      Attachments

        1. image-2020-03-21-13-37-17-039.png
          52 kB
          lamber-ken
        2. image-2020-03-21-02-45-25-099.png
          83 kB
          lamber-ken

        Issue Links

          Activity

            People

              lamber-ken lamber-ken
              afilipchik Alexander Filipchik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m