Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.1
-
None
-
Centos 6.6
-
Reviewed
Description
When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS.
When the file doesn't exist, the balancer don't want to run :
15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox]
15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running.. Exiting ...
Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds
Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ...
2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=null perm=hdfs:hadoop:rw-r----- proto=rpc
2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=null perm=null proto=rpc
The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java
The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error.
private OutputStream checkAndMarkRunning() throws IOException {
try {
if (fs.exists(idPath))
final FSDataOutputStream fsout = fs.create(idPath, false);
// mark balancer idPath to be deleted during filesystem closure
fs.deleteOnExit(idPath);
if (write2IdFile)
return fsout;
} catch(RemoteException e) {
if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName()))
else
{ throw e; } }
}
Regards
Attachments
Attachments
Issue Links
- relates to
-
AMBARI-13946 Non NameNode-HA properties still in hdfs-site.xml causing (at least) Balancer and ATS to fail
-
- Resolved
-
-
HDFS-6300 Prevent multiple balancers from running simultaneously
-
- Closed
-