Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
2.0.0-alpha
-
None
-
None
-
None
Description
This is a high-on-nitpick ticket for the benefit of troubleshooting.
This is partially related to all the PB-changes we've had. And also partially related to Java/JVMs.
Take a case of an AccessControlException, which is pretty common in HDFS permissions layer. We now get, due to several more calls added at the RPC layer for PB (or maybe something else, if am mistaken):
Caused by: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4204)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4175)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2565)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2529)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:640)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42618)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1204)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:205)
at $Proxy10.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
at $Proxy10.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:430)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1717)
... 9 more
The "9 more" is what I was looking for, to identify the caller to debug on/find the exact directory. However it now gets eaten away cause just the mkdir-to-exception trace itself has grown quite a bit. Comparing this to 0.20, we have much fewer calls and that helps us see at least the real caller of mkdirs.
I'm actually not sure what causes Java to print "... X more" in these form of exception prints, but if thats controllable am all in favor of increasing its amount for HDFS (using new default java opts?). So that when an exception does occur, we don't get a nearly-unusable stacktrace.