Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21461

FileNotFoundException when sink to hive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • Connectors / Hive
    • None

    Description

      FileNotFoundException appeared occasionally when reading from kafka and sink to hive.

      Complete exception as follows:

       

      2021-02-23 16:08:092021-02-23 16:08:09org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught exception while processing timer. at org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:1088) at org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:1062) at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1183) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$13(StreamTask.java:1172) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:282) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:190) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) at java.lang.Thread.run(Thread.java:748)Caused by: TimerException{java.io.UncheckedIOException: java.io.FileNotFoundException: File does not exist: hdfs://xxx/dt=2021-02-23/hh=15/.part-fa0b33ca-d27c-44ad-bcd7-564dc1892791-4-8.inprogress.7ed34f7f-0ec6-421e-b8d0-7cccf429c78f} ... 12 moreCaused by: java.io.UncheckedIOException: java.io.FileNotFoundException: File does not exist: hdfs://data2/data/dw/qttods.db/age_fusion_log_hi/dt=2021-02-23/hh=15/.part-fa0b33ca-d27c-44ad-bcd7-564dc1892791-4-8.inprogress.7ed34f7f-0ec6-421e-b8d0-7cccf429c78f at org.apache.flink.connectors.hive.HiveTableSink$HiveRollingPolicy.shouldRollOnProcessingTime(HiveTableSink.java:556) at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onProcessingTime(Bucket.java:320) at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.onProcessingTime(Buckets.java:324) at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.onProcessingTime(StreamingFileSinkHelper.java:95) at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1181) ... 11 moreCaused by: java.io.FileNotFoundException: File does not exist: hdfs://xxx/dt=2021-02-23/hh=15/.part-fa0b33ca-d27c-44ad-bcd7-564dc1892791-4-8.inprogress.7ed34f7f-0ec6-421e-b8d0-7cccf429c78f at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317) at org.apache.flink.connectors.hive.write.HiveBulkWriterFactory$1.getSize(HiveBulkWriterFactory.java:54) at org.apache.flink.formats.hadoop.bulk.HadoopPathBasedPartFileWriter.getSize(HadoopPathBasedPartFileWriter.java:84) at org.apache.flink.connectors.hive.HiveTableSink$HiveRollingPolicy.shouldRollOnProcessingTime(HiveTableSink.java:554) ... 15 more
      

      Sink sql like :

       
      insert into hive_catalog.my_db.sink_table
      /*+ OPTIONS('is_generic'='false',
      'format'='parquet',
      'sink.partition-commit.delay'='60s',
      'sink.partition-commit.policy.kind'='metastore,success-file',
      'sink.partition-commit.success-file.name'='_SUCCESS',
      'table.exec.hive.fallback-mapred-writer'='false') */
      select
      log_timestamp,
      ip,
      field,
      from_unixtime(log_timestamp/1000,'yyyy-MM-dd') as `dt`,
      from_unixtime(log_timestamp/1000,'HH') as `hh`
      from source_table;
       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ZhuShang zhuxiaoshang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: