Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
When I test the node label feature in my local environment, I encountered the following exception:
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2222) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745)
The reason is that HDFS throws an exception when calling ensureAppendEditlogFile because of some reason which causes the edit log output stream isn't closed. This caused that the next time we call ensureAppendEditlogFile, lease recovery will failed because we are just the lease holder.