Uploaded image for project: 'Myriad'
  1. Myriad
  2. MYRIAD-155

Relaunched NM on same node caused NullPointerException while yarn containers were running previously.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • Myriad 0.1.0
    • None
    • None

    Description

      This seems a yarn issue (YARN-2441) when the NM is re-launched on the same node where previously the containers were active/running.

      15/10/15 10:43:18 INFO ipc.Server: Socket Reader #1 for port 31000:
      readAndProcess from client 10.10.101.113 threw exception
      [java.lang.NullPointerException]
      java.lang.NullPointerException
      at
      org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
      at
      org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
      at
      org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.getPassword(DigestAuthMethod.java:212)
      at
      org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.handle(DigestAuthMethod.java:238)
      at
      com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
      at
      com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
      at
      org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1393)
      at
      org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1370)
      at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1283)
      at
      org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1246)
      at
      org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1896)
      at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1764)
      at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1528)
      at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774)
      at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:640)
      at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:611)
      15/10/15 10:43:22 INFO security.NMContainerTokenSecretManager: Updating node
      address : qa101-116.qa.lab:31000

      The issue is that the "AM tries to connect to NM before NM finished registering with RM".

      Myriad can solve this by picking ports randomly from the list of
      random ports it receives from Mesos to differentiate between the NMs from RM's view.

      We can randomly select the NM ports, instead selecting the first few ports as implemented here:

      https://github.com/apache/incubator-myriad/blob/master/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMPorts.java#L46

      Attachments

        Activity

          People

            Unassigned Unassigned
            sarjeet Sarjeet Singh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: