Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is a further refactoring over HDFS-377 to move generic codes from DataXceiver to DataTransferProtocol.

      1. h524_20090803.patch
        11 kB
        Tsz Wo Nicholas Sze
      2. h524_20090804.patch
        12 kB
        Tsz Wo Nicholas Sze
      3. h524_20090804b.patch
        14 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          h524_20090803.patch:

          • In DataTransferProtocol.Receiver,
            • renamed op(..) to readOp(..),
            • changed the public opXxxx(DataInputStream in) methods to private, and
            • changed the public abstract methods to protected.
          • Moved the switch-case statement from DataXceiver to DataTransferProtocol.Receiver.processOp(..).

          Note that there is no incompatible change since the DataTransferProtocol.Receiver APIs are not yet released.

          Show
          Tsz Wo Nicholas Sze added a comment - h524_20090803.patch: In DataTransferProtocol.Receiver, renamed op(..) to readOp(..), changed the public opXxxx(DataInputStream in) methods to private, and changed the public abstract methods to protected. Moved the switch-case statement from DataXceiver to DataTransferProtocol.Receiver.processOp(..). Note that there is no incompatible change since the DataTransferProtocol.Receiver APIs are not yet released.
          Hide
          Konstantin Boudnik added a comment -

          A couple of comments (perhaps because of my ignorance):

          • in DataXceiver: is datanode.getXceiverCount() is so costly that it really make sense performance wise?
            -      LOG.debug(datanode.dnRegistration + ":Number of active connections is: "
            -                               + datanode.getXceiverCount());
            +      if (LOG.isDebugEnabled()) {
            +        LOG.debug(datanode.dnRegistration + ":Number of active connections is: "
            +            + datanode.getXceiverCount());
            +      }
            
          • it seems that the switch statement won't grow up anytime soon, but wouldn't it be better looking and perhaps more efficient to replace it with polymorphism? It will increase a number of classes though, but the control flow would be better enclosed within new classes. I.e. the operations like opReadBlock(), opBlockChecksum(), etc. could be moved to a smaller classes like OpBlockRead, OpBlockCheckSum which will expose only one public method, say, doOperation().
          Show
          Konstantin Boudnik added a comment - A couple of comments (perhaps because of my ignorance): in DataXceiver: is datanode.getXceiverCount() is so costly that it really make sense performance wise? - LOG.debug(datanode.dnRegistration + ":Number of active connections is: " - + datanode.getXceiverCount()); + if (LOG.isDebugEnabled()) { + LOG.debug(datanode.dnRegistration + ":Number of active connections is: " + + datanode.getXceiverCount()); + } it seems that the switch statement won't grow up anytime soon, but wouldn't it be better looking and perhaps more efficient to replace it with polymorphism? It will increase a number of classes though, but the control flow would be better enclosed within new classes. I.e. the operations like opReadBlock(), opBlockChecksum(), etc. could be moved to a smaller classes like OpBlockRead, OpBlockCheckSum which will expose only one public method, say, doOperation().
          Hide
          Tsz Wo Nicholas Sze added a comment -

          in DataXceiver: is datanode.getXceiverCount() is so costly that it really make sense performance wise?

          getXceiverCount() alone is not costly. However, constructing the string, which creates intermediate objects, inside LOG.debug(..) is unnecessary. Since we often got gc problem, I think it is good to minimize object creations.

          it seems that the switch statement won't grow up anytime soon, but wouldn't it be better looking and perhaps more efficient to replace it with polymorphism? It will increase a number of classes though, but the control flow would be better enclosed within new classes. I.e. the operations like opReadBlock(), opBlockChecksum(), etc. could be moved to a smaller classes like OpBlockRead, OpBlockCheckSum which will expose only one public method, say, doOperation().

          This is a good idea. Let's discuss class structure in more details.

          In the patch, we have

          //DataTransferProtocol.java
            public enum Op {
              WRITE_BLOCK((byte)80),
              ...
            }
          
            public static abstract class Receiver {
              protected final void processOp(Op op, DataInputStream in
                  ) throws IOException {
                switch(op) {
                case WRITE_BLOCK:
                  opWriteBlock(in);
                  break;
                 ...
               }
          
              private final void opWriteBlock(DataInputStream in) throws IOException {
                final long blockId = in.readLong();
                 ...
                opWriteBlock(in, blockId, blockGs, pipelineSize, isRecovery,
                    client, src, targets, accesstoken);
              }
          
              protected abstract void opWriteBlock(DataInputStream in,
                  long blockId, long blockGs, int pipelineSize, boolean isRecovery,
                  String client, DatanodeInfo src, DatanodeInfo[] targets,
                  AccessToken accesstoken) throws IOException;
             }
            }
          

          In the implementation, DataXceiver extends DataTransferProtocol.Receiver and implements all the abstract methods.

          //DataXceiver.java
          class DataXceiver extends DataTransferProtocol.Receiver
              implements Runnable, FSConstants {
            ...
            public void opWriteBlock(DataInputStream in, long blockId, long blockGs,
                int pipelineSize, boolean isRecovery,
                String client, DatanodeInfo srcDataNode, DatanodeInfo[] targets,
                AccessToken accessToken) throws IOException {
              ...
            }
          }
          

          If we have the proposed classes OpBlockRead, OpBlockCheckSum, etc., then how should DataXceiver implement the abstract methods?

          Also, how to relate the new classes OpBlockRead and OpBlockCheckSum with the enum Op?

          Show
          Tsz Wo Nicholas Sze added a comment - in DataXceiver: is datanode.getXceiverCount() is so costly that it really make sense performance wise? getXceiverCount() alone is not costly. However, constructing the string, which creates intermediate objects, inside LOG.debug(..) is unnecessary. Since we often got gc problem, I think it is good to minimize object creations. it seems that the switch statement won't grow up anytime soon, but wouldn't it be better looking and perhaps more efficient to replace it with polymorphism? It will increase a number of classes though, but the control flow would be better enclosed within new classes. I.e. the operations like opReadBlock(), opBlockChecksum(), etc. could be moved to a smaller classes like OpBlockRead, OpBlockCheckSum which will expose only one public method, say, doOperation(). This is a good idea. Let's discuss class structure in more details. In the patch, we have //DataTransferProtocol.java public enum Op { WRITE_BLOCK(( byte )80), ... } public static abstract class Receiver { protected final void processOp(Op op, DataInputStream in ) throws IOException { switch (op) { case WRITE_BLOCK: opWriteBlock(in); break ; ... } private final void opWriteBlock(DataInputStream in) throws IOException { final long blockId = in.readLong(); ... opWriteBlock(in, blockId, blockGs, pipelineSize, isRecovery, client, src, targets, accesstoken); } protected abstract void opWriteBlock(DataInputStream in, long blockId, long blockGs, int pipelineSize, boolean isRecovery, String client, DatanodeInfo src, DatanodeInfo[] targets, AccessToken accesstoken) throws IOException; } } In the implementation, DataXceiver extends DataTransferProtocol.Receiver and implements all the abstract methods. //DataXceiver.java class DataXceiver extends DataTransferProtocol.Receiver implements Runnable , FSConstants { ... public void opWriteBlock(DataInputStream in, long blockId, long blockGs, int pipelineSize, boolean isRecovery, String client, DatanodeInfo srcDataNode, DatanodeInfo[] targets, AccessToken accessToken) throws IOException { ... } } If we have the proposed classes OpBlockRead, OpBlockCheckSum, etc., then how should DataXceiver implement the abstract methods? Also, how to relate the new classes OpBlockRead and OpBlockCheckSum with the enum Op?
          Hide
          Konstantin Boudnik added a comment -

          As I've suspected it was my ignorance talking - not me

          After some more careful reading of the code, it turned out to me that

          • while Op might be a good candidate to become a factory which will be handling instantiation of all these new operation specific classes
          • DataXceiver might not need to extend DataTransferProtocol.Receiver as soon as all operations are pulled out into a separate classes
            it won't solve all the problems.

          The biggest one, in my opinion, is that DataTransferProtocol.Receiver has two inner classes Sender and Receiver which have their own sets of unique operations to read, write, and copy blocks. Although, it might be possible to generalize those with some combination of polymorphism and delegation, it won't improve the current situation and will, perhaps, degrade the existing design.

          I guess I'm ready to withdraw my initial comment - thanks for crashing it !

          Show
          Konstantin Boudnik added a comment - As I've suspected it was my ignorance talking - not me After some more careful reading of the code, it turned out to me that while Op might be a good candidate to become a factory which will be handling instantiation of all these new operation specific classes DataXceiver might not need to extend DataTransferProtocol.Receiver as soon as all operations are pulled out into a separate classes it won't solve all the problems. The biggest one, in my opinion, is that DataTransferProtocol.Receiver has two inner classes Sender and Receiver which have their own sets of unique operations to read, write, and copy blocks. Although, it might be possible to generalize those with some combination of polymorphism and delegation, it won't improve the current situation and will, perhaps, degrade the existing design. I guess I'm ready to withdraw my initial comment - thanks for crashing it !
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I agree that the current Sender and Receiver classes aren't the best design. Since we probably will re-implement everything with Avro in the future, let's keep the changes minimal.

          h524_20090804.patch: changed public methods in DataXceiver to protected.

          Show
          Tsz Wo Nicholas Sze added a comment - I agree that the current Sender and Receiver classes aren't the best design. Since we probably will re-implement everything with Avro in the future, let's keep the changes minimal. h524_20090804.patch: changed public methods in DataXceiver to protected.
          Hide
          Konstantin Boudnik added a comment -
          • final along with private seems to be a redundant declaration, 'cause private modifier assumes that won't be overridden.
          • would it make sense to have a separate metric update method to replace nearly identical code like
            +    datanode.myMetrics.writeBlockOp.inc(DataNode.now() - opStartTime);
            +    final boolean local = s.getInetAddress().equals(s.getLocalAddress());
            +    if (local)
            +      datanode.myMetrics.writesFromLocalClient.inc();
            +    else
            +      datanode.myMetrics.writesFromRemoteClient.inc();
            

            which would handle and wrap the update logic? Perhaps updateDNMetrics(MetricsBase metricType) or similar?

          • also,
            +    final boolean local = s.getInetAddress().equals(s.getLocalAddress());
            

            might be converted to the class level constant to avoid an extra initialization on each call of opWriteBlock and opReadBlock

          I know that only public methods should have JavaDoc, but wouldn't it be good to have them for protected ones as well?

          Show
          Konstantin Boudnik added a comment - final along with private seems to be a redundant declaration, 'cause private modifier assumes that won't be overridden. would it make sense to have a separate metric update method to replace nearly identical code like + datanode.myMetrics.writeBlockOp.inc(DataNode.now() - opStartTime); + final boolean local = s.getInetAddress().equals(s.getLocalAddress()); + if (local) + datanode.myMetrics.writesFromLocalClient.inc(); + else + datanode.myMetrics.writesFromRemoteClient.inc(); which would handle and wrap the update logic? Perhaps updateDNMetrics(MetricsBase metricType) or similar? also, + final boolean local = s.getInetAddress().equals(s.getLocalAddress()); might be converted to the class level constant to avoid an extra initialization on each call of opWriteBlock and opReadBlock I know that only public methods should have JavaDoc, but wouldn't it be good to have them for protected ones as well?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h524_20090804b.patch:

          • removed final from private methods.
          • added static methods, updateDuration(..) and updateCounter(..) in DataXceiver
          • added a new field isLocal in DataXceiver.
          • added javadoc for protected methods.
          • also changed the fields in DataXceiver to private.

          Thanks Cos for the review.

          Show
          Tsz Wo Nicholas Sze added a comment - h524_20090804b.patch: removed final from private methods. added static methods, updateDuration(..) and updateCounter(..) in DataXceiver added a new field isLocal in DataXceiver. added javadoc for protected methods. also changed the fields in DataXceiver to private. Thanks Cos for the review.
          Hide
          Konstantin Boudnik added a comment -

          +1 patch looks good!

          Show
          Konstantin Boudnik added a comment - +1 patch looks good!
          Hide
          Tsz Wo Nicholas Sze added a comment -
               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no new tests are needed for this patch.
               [exec]                         Also please list what manual steps were performed to verify this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          No new tests added since this is a code refactoring. Running unit tests locally...

          Show
          Tsz Wo Nicholas Sze added a comment - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. No new tests added since this is a code refactoring. Running unit tests locally...
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Passed all unit tests.

          I have committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - Passed all unit tests. I have committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #40 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/40/)
          . Further DataTransferProtocol code refactoring.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #40 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/40/ ) . Further DataTransferProtocol code refactoring.

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development