Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
As part of the direct insert optimisation (same issue is there for MM table also, without direct insert optimisation), the files from Tez jobs are moved to the table directory for ACID tables. Then the duplicate removal is done. Each session scan through the tables and cleans up the file related to specific session. But the iterator is created over all the files. So the FileNotFoundException is thrown when multiple sessions are acting on same table and the first session cleans up its data which is being read by the second session.
This is fixed as part of HIVE-24679
Caused by: java.io.FileNotFoundException: File hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/_tmp.delta_0000981_0000981_0000 does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2816) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
The below path is fixed by HIVE-24682
Caused by: java.io.FileNotFoundException: File hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/.hive-staging_hive_2022-01-19_05-18-38_933_1683918321120508074-54 does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Utilities.getFullDPSpecs(Utilities.java:2971) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
Attachments
Issue Links
- is caused by
-
HIVE-23410 ACID: Improve the delete and update operations to avoid the move step
- Closed
- relates to
-
HIVE-24679 Reuse FullDPSpecs in loadDynamicPartitions to avoid double listing
- Closed
-
HIVE-24682 Collect dynamic partition info in FileSink for direct insert and reuse it in Movetask
- Closed
-
HIVE-24738 Reuse committed filelist from directInsert manifest during loadPartition
- Closed