Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8897

Balancer should handle fs.defaultFS trailing slash in HA

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: balancer & mover
    • Labels:
      None
    • Environment:

      Centos 6.6

      Description

      When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS.

      When the file doesn't exist, the balancer don't want to run :

      15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox]
      15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
      Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
      15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
      15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
      15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
      15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
      15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
      15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
      15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
      java.io.IOException: Another Balancer is running.. Exiting ...
      Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds

      Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ...

      2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=null perm=hdfs:hadoop:rw-r----- proto=rpc
      2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=null perm=null proto=rpc

      The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java

      The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error.


      private OutputStream checkAndMarkRunning() throws IOException {
      try {
      if (fs.exists(idPath))

      { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); }

      final FSDataOutputStream fsout = fs.create(idPath, false);
      // mark balancer idPath to be deleted during filesystem closure
      fs.deleteOnExit(idPath);
      if (write2IdFile)

      { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); }

      return fsout;
      } catch(RemoteException e) {
      if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName()))

      { return null; }

      else

      { throw e; }

      }
      }


      Regards

        Attachments

        1. HDFS-8897-branch-2.006.patch
          11 kB
          John Zhuge
        2. HDFS-8897.006.patch
          11 kB
          John Zhuge
        3. HDFS-8897.005.patch
          8 kB
          John Zhuge
        4. HDFS-8897.004.patch
          7 kB
          John Zhuge
        5. HDFS-8897.003.patch
          7 kB
          John Zhuge
        6. HDFS-8897.002.patch
          6 kB
          John Zhuge
        7. HDFS-8897.001.patch
          3 kB
          John Zhuge

          Issue Links

            Activity

              People

              • Assignee:
                jzhuge John Zhuge
                Reporter:
                Alexandre LINTE LINTE
              • Votes:
                1 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: