Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3152

Missing hadoop exclude file fails RMs in HA

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:
      None
    • Environment:

      Debian 7

      Description

      NI have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception:

      2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed
      2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
      org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
      at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
      at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
      at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
      at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
      at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
      Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
      at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
      at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
      ... 4 more
      Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory)
      at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
      at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
      ... 5 more
      2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
      2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed
      2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
      2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error)
      2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session

      The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file.

      I just think as of a improvement:

      • Should RMs ignore the missing file as the NNs did?
      • Should single RM fail even when the file is not present?

      Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Naganarasimha Naganarasimha G R
                Reporter:
                neillfontes Neill Lima
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: