Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-13373

hdfs balancer via ambari fails to run once HA is enabled

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.1.2
    • Fix Version/s: 2.2.0
    • Component/s: ambari-web
    • Labels:
      None

      Description

      Ran hdfs balancer via ambari on a cluster that had HA enabled and it failed.

      Starting balancer with threshold = 10
      Executing command ambari-sudo.sh su hdfs -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'"'"' ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold 10'
      2015-10-06 23:33:27,059 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'"'"' ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold 10''] {'logoutput': False, 'on_new_line': handle_new_line}
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: Using a threshold of 10.0
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: namenodes  = [hdfs://pre-prod-poc-1.novalocal:8020, hdfs://pre-prod-hdp-2-3]
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false]
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: included nodes = []
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: excluded nodes = []
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: source nodes = []
      [balancer] Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved[balancer] 
      [balancer] org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
      	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1872)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1306)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1618)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:595)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto[balancer] bufRpcEngine.java:616)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
      .  Exiting ...[balancer] 
      [balancer] Oct 6, 2015 11:33:31 PM [balancer]  [balancer] Balancing took 2.281 seconds[balancer]
      

      If you look at the log it looks like we are adding a namenode to the list which is in standby. Should we not be using just the name service?

      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: namenodes  = [hdfs://pre-prod-poc-1.novalocal:8020, hdfs://pre-prod-hdp-2-3]
      [balancer] 15/10/06 23:33:29 INFO balancer.Balancer: parameters = Balancer.Parameters 
      
      [root@pre-prod-poc-1 hive-testbench]# ambari-server --hash
      226dfd1c6136f859fc42dd18e7090a9346f0f745
      
      root@pre-prod-poc-1 hive-testbench]# rpm -qa | grep ambari
      ambari-metrics-hadoop-sink-2.1.2-370.x86_64
      ambari-server-2.1.2-370.x86_64
      ambari-metrics-monitor-2.1.2-370.x86_64
      ambari-agent-2.1.2-370.x86_64
      [root@pre-prod-poc-1 hive-testbench]#
      

        Attachments

        1. AMBARI-13373.patch
          0.7 kB
          Antonenko Alexander
        2. AMBARI-13373_2.patch
          12 kB
          Dmytro Sen

          Issue Links

            Activity

              People

              • Assignee:
                dsen Dmytro Sen
                Reporter:
                aantonenko Antonenko Alexander
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: