Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8897

Balancer should handle fs.defaultFS trailing slash in HA

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 3.0.0-alpha1
    • balancer & mover
    • None
    • Centos 6.6

    Description

      When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS.

      When the file doesn't exist, the balancer don't want to run :

      15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox]
      15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
      Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
      15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
      15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
      15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
      15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
      15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
      15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
      15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
      java.io.IOException: Another Balancer is running.. Exiting ...
      Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds

      Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ...

      2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=null perm=hdfs:hadoop:rw-r----- proto=rpc
      2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
      2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=null perm=null proto=rpc

      The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java

      The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error.


      private OutputStream checkAndMarkRunning() throws IOException {
      try {
      if (fs.exists(idPath))

      { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); }

      final FSDataOutputStream fsout = fs.create(idPath, false);
      // mark balancer idPath to be deleted during filesystem closure
      fs.deleteOnExit(idPath);
      if (write2IdFile)

      { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); }

      return fsout;
      } catch(RemoteException e) {
      if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName()))

      { return null; }

      else

      { throw e; }

      }
      }


      Regards

      Attachments

        1. HDFS-8897.001.patch
          3 kB
          John Zhuge
        2. HDFS-8897.002.patch
          6 kB
          John Zhuge
        3. HDFS-8897.003.patch
          7 kB
          John Zhuge
        4. HDFS-8897.004.patch
          7 kB
          John Zhuge
        5. HDFS-8897.005.patch
          8 kB
          John Zhuge
        6. HDFS-8897.006.patch
          11 kB
          John Zhuge
        7. HDFS-8897-branch-2.006.patch
          11 kB
          John Zhuge

        Issue Links

          Activity

            People

              jzhuge John Zhuge
              Alexandre LINTE LINTE
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: