Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0, 3.0.0
-
None
Description
When partition's task output empty, HiveHFileOutputFormat throws FileNotFoundException like this:
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing... 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 0 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0 2019-09-24 19:15:55,915 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1 2019-09-24 19:15:55,954 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2019-09-24 19:15:56,089 ERROR [main] ExecReducer: Hit error while closing operators - failing tree 2019-09-24 19:15:56,090 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist. at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist. at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:200) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1016) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278) ... 7 more Caused by: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:880) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:109) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:938) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:934) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:945) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1592) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1632) at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:153) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:197) ... 11 more 2019-09-24 19:15:56,093 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
I think we should skip it if srcDir do not exist, fix like this:
@Override public void close(boolean abort) throws IOException { try { ... FileStatus [] files = null; for (;;) { try { files = fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER); } catch (FileNotFoundException fnfe) { LOG.error(String.format("Output data is empty, please check Task [ %s ]", tac.getTaskAttemptID().toString()), fnfe); break; } } if (files != null ) { for (FileStatus regionFile : fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER)) { fs.rename(regionFile.getPath(), new Path(columnFamilyPath, regionFile.getPath().getName())); } } for (FileStatus regionFile : fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER)) { fs.rename( ... } catch (InterruptedException ex) { throw new IOException(ex); } }
Attachments
Issue Links
- links to