Hadoop Common
  1. Hadoop Common
  2. HADOOP-5015

Separate block/replica management code from FSNamesystem

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently FSNamesystem contains a big amount of code that manages blocks and replicas. The code scatters in FSNamesystem and it is hard to read and maintain. It would be nice to move the code to a separate class called, for example, BlockManager.

      1. blkmanager.patch
        135 kB
        Suresh Srinivas
      2. blkmanager.patch
        136 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          Created 5782 to revert some of the formatting changes

          Show
          Suresh Srinivas added a comment - Created 5782 to revert some of the formatting changes
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #828 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/828/)
          . Separate block management code from FSNamesystem. Contributed by Suresh Srinivas

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #828 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/828/ ) . Separate block management code from FSNamesystem. Contributed by Suresh Srinivas
          Hide
          Raghu Angadi added a comment - - edited

          > To make code review simpler, I have also retained the structure of the code moved from FSNamesystem.java as it is in BlockManager.java.

          There are a lot of formatting changes that break existing patches for HDFS :

          e.g. segment (ideally it should have been a clean cut-n-paste) :

               synchronized (neededReplications) {
          -      out.println("Metasave: Blocks waiting for replication: " +
          -                  neededReplications.size());
          +      out.println("Metasave: Blocks waiting for replication: "
          +          + neededReplications.size());
                 for (Block block : neededReplications) {
          -        List<DatanodeDescriptor> containingNodes =
          -                                          new ArrayList<DatanodeDescriptor>();
          +        List<DatanodeDescriptor> containingNodes = new ArrayList<DatanodeDescriptor>();
                   NumberReplicas numReplicas = new NumberReplicas();
                   // source node returned is not used
                   chooseSourceDatanode(block, containingNodes, numReplicas);
          -        int usableReplicas = numReplicas.liveReplicas() +
          -                             numReplicas.decommissionedReplicas();
          +        int usableReplicas = numReplicas.liveReplicas()
          +            + numReplicas.decommissionedReplicas();
                   // l: == live:, d: == decommissioned c: == corrupt e: == excess
          -        out.print(block + " (replicas:" +
          -                  " l: " + numReplicas.liveReplicas() +
          -                  " d: " + numReplicas.decommissionedReplicas() +
          -                  " c: " + numReplicas.corruptReplicas() +
          -                  " e: " + numReplicas.excessReplicas() +
          -                  ((usableReplicas > 0)? "" : " MISSING") + ")");
          +        out.print(block + " (replicas:" + " l: " + numReplicas.liveReplicas()
          +            + " d: " + numReplicas.decommissionedReplicas() + " c: "
          +            + numReplicas.corruptReplicas() + " e: "
          +            + numReplicas.excessReplicas()
          +            + ((usableReplicas > 0) ? "" : " MISSING") + ")");
           
          -        for (Iterator<DatanodeDescriptor> jt = blocksMap.nodeIterator(block);
          -             jt.hasNext();) {
          +        for (Iterator<DatanodeDescriptor> jt = blocksMap.nodeIterator(block); jt
          +            .hasNext();) {
                     DatanodeDescriptor node = jt.next();
                     out.print(" " + node + " : ");
                   }
                   out.println("");
          

          Without such changes, it would have been much simpler to port the patches (just by changing the file name in patch file.

          Show
          Raghu Angadi added a comment - - edited > To make code review simpler, I have also retained the structure of the code moved from FSNamesystem.java as it is in BlockManager.java. There are a lot of formatting changes that break existing patches for HDFS : e.g. segment (ideally it should have been a clean cut-n-paste) : synchronized (neededReplications) { - out.println("Metasave: Blocks waiting for replication: " + - neededReplications.size()); + out.println("Metasave: Blocks waiting for replication: " + + neededReplications.size()); for (Block block : neededReplications) { - List<DatanodeDescriptor> containingNodes = - new ArrayList<DatanodeDescriptor>(); + List<DatanodeDescriptor> containingNodes = new ArrayList<DatanodeDescriptor>(); NumberReplicas numReplicas = new NumberReplicas(); // source node returned is not used chooseSourceDatanode(block, containingNodes, numReplicas); - int usableReplicas = numReplicas.liveReplicas() + - numReplicas.decommissionedReplicas(); + int usableReplicas = numReplicas.liveReplicas() + + numReplicas.decommissionedReplicas(); // l: == live:, d: == decommissioned c: == corrupt e: == excess - out.print(block + " (replicas:" + - " l: " + numReplicas.liveReplicas() + - " d: " + numReplicas.decommissionedReplicas() + - " c: " + numReplicas.corruptReplicas() + - " e: " + numReplicas.excessReplicas() + - ((usableReplicas > 0)? "" : " MISSING") + ")"); + out.print(block + " (replicas:" + " l: " + numReplicas.liveReplicas() + + " d: " + numReplicas.decommissionedReplicas() + " c: " + + numReplicas.corruptReplicas() + " e: " + + numReplicas.excessReplicas() + + ((usableReplicas > 0) ? "" : " MISSING") + ")"); - for (Iterator<DatanodeDescriptor> jt = blocksMap.nodeIterator(block); - jt.hasNext();) { + for (Iterator<DatanodeDescriptor> jt = blocksMap.nodeIterator(block); jt + .hasNext();) { DatanodeDescriptor node = jt.next(); out.print(" " + node + " : "); } out.println(""); Without such changes, it would have been much simpler to port the patches (just by changing the file name in patch file.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Suresh!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Suresh!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          I will commit this later today.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good. I will commit this later today.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407169/blkmanager.patch
          against trunk revision 771661.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407169/blkmanager.patch against trunk revision 771661. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/291/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The code refactoriing is good. It helps a lot in code readability.

          Show
          Tsz Wo Nicholas Sze added a comment - The code refactoriing is good. It helps a lot in code readability.
          Hide
          Suresh Srinivas added a comment -

          Incorporated all the comments in the attached patch, except the following:

          1. Not moving generationStamp to blockManager as it is used in FSNamesystem and updated during file creation
          2. blockInvalidateLimit can be renamed in a separate change
          3. replicationRecheckInterval should be moved into ReplicationMonitor in a separate change
          Show
          Suresh Srinivas added a comment - Incorporated all the comments in the attached patch, except the following: Not moving generationStamp to blockManager as it is used in FSNamesystem and updated during file creation blockInvalidateLimit can be renamed in a separate change replicationRecheckInterval should be moved into ReplicationMonitor in a separate change
          Hide
          dhruba borthakur added a comment -

          This patch is not related to HADOOP-3799. This patch attempts to separate out the block manager code from the namespace manager code whereas HADOOP-3799 makes the replica-placement pluggable.

          Show
          dhruba borthakur added a comment - This patch is not related to HADOOP-3799 . This patch attempts to separate out the block manager code from the namespace manager code whereas HADOOP-3799 makes the replica-placement pluggable.
          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          Could someone with a deeper knowledge of HDFS internals comment on what work would need to be done after this refactoring to complete https://issues.apache.org/jira/browse/HADOOP-3799? Any insight here would be much appreciated!

          Later,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey, Could someone with a deeper knowledge of HDFS internals comment on what work would need to be done after this refactoring to complete https://issues.apache.org/jira/browse/HADOOP-3799? Any insight here would be much appreciated! Later, Jeff
          Hide
          dhruba borthakur added a comment -

          This refactoring looks great!

          Show
          dhruba borthakur added a comment - This refactoring looks great!
          Hide
          Konstantin Shvachko added a comment -

          This is a nice refactoring. A few comments:

          1. FSDirectory should have a method getBlockManager(). This will let you avoid unnecessary transient methods in FSNamesystem that only exist to call the respective BlockManager methods.
          2. BlocksWithLocations getBlocks() should remain in FSNamesystem. I think BlockManager should not be exposed to external types like BlocksWithLocations. Besides getBlocks() does not work with any intrinsic fields of BlockManager.
          3. It seems to me that ReplicationTargetChooser replicator should be also moved into BlockManager.
          4. I am not sure about the following fields, which may be related to block management as well, but the patch left them inside FSNamesystem:
            • generationStamp seem to be related
            • blockInvalidateLimit looks like unrelated, we may later want to rename it to datanodeInvalidateLimit.
            • ReplicationMonitor along with replicationRecheckInterval - probably related.
          Show
          Konstantin Shvachko added a comment - This is a nice refactoring. A few comments: FSDirectory should have a method getBlockManager() . This will let you avoid unnecessary transient methods in FSNamesystem that only exist to call the respective BlockManager methods. BlocksWithLocations getBlocks() should remain in FSNamesystem . I think BlockManager should not be exposed to external types like BlocksWithLocations . Besides getBlocks() does not work with any intrinsic fields of BlockManager . It seems to me that ReplicationTargetChooser replicator should be also moved into BlockManager . I am not sure about the following fields, which may be related to block management as well, but the patch left them inside FSNamesystem : generationStamp seem to be related blockInvalidateLimit looks like unrelated, we may later want to rename it to datanodeInvalidateLimit . ReplicationMonitor along with replicationRecheckInterval - probably related.
          Hide
          Suresh Srinivas added a comment -

          As a first step towards separating out block management functionality from FSNamesystem.java, I have introduced a new class BlockManager.java. This new class is to be only used by FSNamesystem, using the synchronization as it exists today. To make code review simpler, I have also retained the structure of the code moved from FSNamesystem.java as it is in BlockManager.java. After this change, we could have more iterations to organize the code better with in BlockManager.java.

          Show
          Suresh Srinivas added a comment - As a first step towards separating out block management functionality from FSNamesystem.java, I have introduced a new class BlockManager.java. This new class is to be only used by FSNamesystem , using the synchronization as it exists today. To make code review simpler, I have also retained the structure of the code moved from FSNamesystem.java as it is in BlockManager.java. After this change, we could have more iterations to organize the code better with in BlockManager.java.
          Hide
          Hong Tang added a comment -

          My wish list:

          • Blocks are addressed through numeric IDs (160-bit or even longer, ID never recycled).
          • Inode Block (a block that contains the Block IDs of a file's data blocks.)
          • NN maintains mapping of path => Inode Block ID.
          Show
          Hong Tang added a comment - My wish list: Blocks are addressed through numeric IDs (160-bit or even longer, ID never recycled). Inode Block (a block that contains the Block IDs of a file's data blocks.) NN maintains mapping of path => Inode Block ID.
          Hide
          dhruba borthakur added a comment -

          This will be the first step to maybe run the BlockManager separately from the namespace manager! That will be pretty cool!

          Show
          dhruba borthakur added a comment - This will be the first step to maybe run the BlockManager separately from the namespace manager! That will be pretty cool!

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Hairong Kuang
            • Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development