Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22247

HiveHFileOutputFormat throws FileNotFoundException when partition's task output empty

    XMLWordPrintableJSON

Details

    Description

      When partition's task output empty, HiveHFileOutputFormat throws FileNotFoundException like this:

      2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing... 
      2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 0
      2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
      2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0
      2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
      2019-09-24 19:15:55,915 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
      2019-09-24 19:15:55,954 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
      2019-09-24 19:15:56,089 ERROR [main] ExecReducer: Hit error while closing operators - failing tree
      2019-09-24 19:15:56,090 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:200)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1016)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
        ... 7 more
      Caused by: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:880)
        at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:109)
        at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:938)
        at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:934)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:945)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1592)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1632)
        at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:153)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:197)
        ... 11 more
      
      2019-09-24 19:15:56,093 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
      

      I think we should skip it if srcDir do not exist, fix like this:

      @Override
      public void close(boolean abort) throws IOException {
        try {
      
          ...
      
          FileStatus [] files = null;
          for (;;) {
            try {
              files = fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER);
            } catch (FileNotFoundException fnfe) {
              LOG.error(String.format("Output data is empty, please check Task [ %s ]", tac.getTaskAttemptID().toString()), fnfe);
              break;
            }
          }
          if (files != null ) {
            for (FileStatus regionFile : fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER)) {
              fs.rename(regionFile.getPath(), new Path(columnFamilyPath, regionFile.getPath().getName()));
            }
          }
          for (FileStatus regionFile : fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER)) {
            fs.rename(
           
          ...
      
        } catch (InterruptedException ex) {
          throw new IOException(ex);
        }
      }
      

      Attachments

        Activity

          People

            ayushtkn Ayush Saxena
            xiepengjie xiepengjie
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m