Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
This seems a yarn issue (YARN-2441) when the NM is re-launched on the same node where previously the containers were active/running.
15/10/15 10:43:18 INFO ipc.Server: Socket Reader #1 for port 31000:
readAndProcess from client 10.10.101.113 threw exception
[java.lang.NullPointerException]
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
at
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
at
org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.getPassword(DigestAuthMethod.java:212)
at
org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.handle(DigestAuthMethod.java:238)
at
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
at
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at
org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1393)
at
org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1370)
at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1283)
at
org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1246)
at
org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1896)
at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1764)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1528)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:640)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:611)
15/10/15 10:43:22 INFO security.NMContainerTokenSecretManager: Updating node
address : qa101-116.qa.lab:31000
The issue is that the "AM tries to connect to NM before NM finished registering with RM".
Myriad can solve this by picking ports randomly from the list of
random ports it receives from Mesos to differentiate between the NMs from RM's view.
We can randomly select the NM ports, instead selecting the first few ports as implemented here: