Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
2.2.0, 3.0.0
-
None
-
None
-
YHIVE-751
Description
For large Pig/HCat queries that produce a large number of partitions/directories/files, we have seen cases where the HDFS NameNode groaned under the weight of FileSystem.setOwner() calls, originating from the commit-step. This was the result of the following code in FileOutputCommitterContainer:
private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission permission, List<AclEntry> acls, String group, boolean recursive) throws IOException { ... if (recursive) { for (FileStatus fileStatus : fs.listStatus(dir)) { if (fileStatus.isDir()) { applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, group, true); } else { fs.setPermission(fileStatus.getPath(), permission); chown(fs, fileStatus.getPath(), group); } } } } private void chown(FileSystem fs, Path file, String group) throws IOException { try { fs.setOwner(file, null, group); } catch (AccessControlException ignore) { // Some users have wrong table group, ignore it. LOG.warn("Failed to change group of partition directories/files: " + file, ignore); } }
One call per file/directory is far too many. We have a patch that reduces the namenode pressure.
Attachments
Attachments
Issue Links
- is blocked by
-
HIVE-17803 With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs
- Closed