Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2675

Not an Avro data file

    XMLWordPrintableJSON

Details

    Description

      This exception may be thrown during archive commit and clean operation,I'm not sure if there are other places where this exception might be thrown.
      For archive commit ,the exception will be thrown when the .rollback file size is 0.
      For clean operation, the exception will also be thrown when the size of  `. clean`  and `. clean.requested`  and `. clean.inflight`  files is 0.

      I think we can filter out files of size 0 when we get clean and archive instances?  in addition, it is unclear what caused the .file size to be 0. Is it due to high concurrency or abnormal exit of the program due to network reasons?

      1. .rollback is 0

      o.a.hudi.table.HoodieTimelineArchiveLog Failed to archive commits, .commit file: 20210927181216.rollback
       java.io.IOException: Not an Avro data file
       at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
       at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
       at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:103)
       at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:341)
       at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:305)
       at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
       at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:439)
       at org.apache.hudi.client.HoodieJavaWriteClient.postWrite(HoodieJavaWriteClient.java:187)
       at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:129)
       at org.apache.nifi.processors.javaHudi.JavaHudi.write(JavaHudi.java:427)
       at org.apache.nifi.processors.javaHudi.JavaHudi.onTrigger(JavaHudi.java:331)
       at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
       at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1166)
       at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:208)
       at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)

      2. . clean is 0

      org.apache.hudi.exception.HoodieIOException: Failed to schedule clean operation
       at org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.requestClean(BaseCleanPlanActionExecutor.java:95)
       at org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.requestClean(BaseCleanPlanActionExecutor.java:107)
       at org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.execute(BaseCleanPlanActionExecutor.java:129)
       at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.scheduleCleaning(HoodieJavaCopyOnWriteTable.java:182)
       at org.apache.hudi.client.AbstractHoodieWriteClient.scheduleTableServiceInternal(AbstractHoodieWriteClient.java:961)
       at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:653)
       at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:641)
       at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:672)
       at org.apache.hudi.client.AbstractHoodieWriteClient.autoCleanOnCommit(AbstractHoodieWriteClient.java:505)
       at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:440)
       at org.apache.hudi.client.HoodieJavaWriteClient.postWrite(HoodieJavaWriteClient.java:187)
       at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:129)
       at org.apache.nifi.processors.javaHudi.JavaHudi.write(JavaHudi.java:405)
       at org.apache.nifi.processors.javaHudi.JavaHudi.onTrigger(JavaHudi.java:335)
       at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
       at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1166)
       at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:208)
       at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       Caused by: java.io.IOException: Not an Avro data file
       at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
       at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
       at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:152)
       at org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsForCleanByCommits(CleanPlanner.java:150)
       at org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsToClean(CleanPlanner.java:126)
       at org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.requestClean(BaseCleanPlanActionExecutor.java:73)
       ... 24 more

      3. `. clean.requested`  and `. clean.inflight`   is 0

      o.a.h.t.a.clean.BaseCleanActionExecutor Failed to perform previous clean operation, instant: [==>20211011143809__clean__REQUESTED]
       org.apache.hudi.exception.HoodieIOException: Not an Avro data file
       at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.runPendingClean(BaseCleanActionExecutor.java:87)
       at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.lambda$execute$0(BaseCleanActionExecutor.java:137)
       at java.util.ArrayList.forEach(ArrayList.java:1257)
       at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.execute(BaseCleanActionExecutor.java:134)
       at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.clean(HoodieJavaCopyOnWriteTable.java:188)
       at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:660)
       at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:641)
       at org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:672)
       at org.apache.hudi.client.AbstractHoodieWriteClient.autoCleanOnCommit(AbstractHoodieWriteClient.java:505)
       at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:440)
       at org.apache.hudi.client.HoodieJavaWriteClient.postWrite(HoodieJavaWriteClient.java:187)
       at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:129)
       at org.apache.nifi.processors.javaHudi.JavaHudi.write(JavaHudi.java:401)
       at org.apache.nifi.processors.javaHudi.JavaHudi.onTrigger(JavaHudi.java:305)
       at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
       at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1166)
       at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:208)
       at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       Caused by: java.io.IOException: Not an Avro data file
       at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
       at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
       at org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:106)
       at org.apache.hudi.table.action.clean.BaseCleanActionExecutor.runPendingClean(BaseCleanActionExecutor.java:84)
       ... 24 common frames omitted

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              dongkelun 董可伦
              dongkelun 董可伦
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: