Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1400

HDFS federation: Introduce block pool ID into DataTransferProtocol

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Federation Branch
    • Fix Version/s: Federation Branch
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed

      Description

      Block Pool ID needs to be introduced in to DataTransferProtocol

      1. HDFS-1400.1.patch
        48 kB
        Suresh Srinivas
      2. HDFS-1400.patch
        56 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          Currently DataTransferProtocol has methods such as:

              public static void opReadBlock(DataOutputStream out, long blockId,
                  long blockGs, long blockOffset, long blockLen, String clientName,
                  Token<BlockTokenIdentifier> blockToken) throws IOException;
          

          The client has to pass the individual elements that make block identification such as blockId and generation stamp. While adding block pool ID, I propose methods with the following format:

              public static void opReadBlock(DataOutputStream out, ExtendedBlock block,
                  long blockOffset, long blockLen, String clientName,
                  Token<BlockTokenIdentifier> blockToken) throws IOException;
          

          With this, the client need not understand the internals of ExtendedBlock. It receives ExtendedBlock over RPC and sends it in DataTransferProtocol. This helps in making ExtendedBlock opaque to the client.

          Show
          Suresh Srinivas added a comment - Currently DataTransferProtocol has methods such as: public static void opReadBlock(DataOutputStream out, long blockId, long blockGs, long blockOffset, long blockLen, String clientName, Token<BlockTokenIdentifier> blockToken) throws IOException; The client has to pass the individual elements that make block identification such as blockId and generation stamp. While adding block pool ID, I propose methods with the following format: public static void opReadBlock(DataOutputStream out, ExtendedBlock block, long blockOffset, long blockLen, String clientName, Token<BlockTokenIdentifier> blockToken) throws IOException; With this, the client need not understand the internals of ExtendedBlock. It receives ExtendedBlock over RPC and sends it in DataTransferProtocol. This helps in making ExtendedBlock opaque to the client.
          Hide
          Suresh Srinivas added a comment -

          First version of the patch with DataTransferProtocol changes. Since the change to use block pool ID is not complete, there are a lot of TODO:FEDERATION comments identifying further changes that are required.

          In tests TestDataTransferProtocol and TestBlockReplacement, there were instances where DatanodeTransfer protocol helper methods were not used to send requests. Instead the test serialized and wrote individual elements of the protocol request. I have changed this to use helper methods. This helps in catching the future protocol changes as compile time error instead of runtime.

          Show
          Suresh Srinivas added a comment - First version of the patch with DataTransferProtocol changes. Since the change to use block pool ID is not complete, there are a lot of TODO:FEDERATION comments identifying further changes that are required. In tests TestDataTransferProtocol and TestBlockReplacement, there were instances where DatanodeTransfer protocol helper methods were not used to send requests. Instead the test serialized and wrote individual elements of the protocol request. I have changed this to use helper methods. This helps in catching the future protocol changes as compile time error instead of runtime.
          Hide
          dhruba borthakur added a comment -

          The proposed format of passing in the ExtendedBlock (instead of individual blockid, genstamp, etc) looks much cleaner.

          Show
          dhruba borthakur added a comment - The proposed format of passing in the ExtendedBlock (instead of individual blockid, genstamp, etc) looks much cleaner.
          Hide
          Konstantin Shvachko added a comment -

          Agree with Dhruba. Passing Block even in current code would be a good refactoring.

          1. I propose to do it in 2 phases. First, refactor trunk by aggregating blockId and GS into Block parameters across the entire data-transfer code, then replacing it with ExtendedBlock in the federation branch. This will take a bit more time now, but will save a lot of time later merging trunk changes to the branch.
          2. With your patch you unnecessarily write block length in the serialized message passed to via DataTransferProtocol. The only thing necessary there is the requested data length. This is different from current implementation, and may cause confusion in the future as people wouldn't know that the block length is actually not required.
          Show
          Konstantin Shvachko added a comment - Agree with Dhruba. Passing Block even in current code would be a good refactoring. I propose to do it in 2 phases. First, refactor trunk by aggregating blockId and GS into Block parameters across the entire data-transfer code, then replacing it with ExtendedBlock in the federation branch. This will take a bit more time now, but will save a lot of time later merging trunk changes to the branch. With your patch you unnecessarily write block length in the serialized message passed to via DataTransferProtocol. The only thing necessary there is the requested data length. This is different from current implementation, and may cause confusion in the future as people wouldn't know that the block length is actually not required.
          Hide
          Suresh Srinivas added a comment -

          Thanks for the review. Attached is the trunk version of the patch.

          A block has two pieces of information.

          1. Block identifier: this include Block ID and Generation Stamp.
          2. Block attribute/information: len

          I have introduced writeId() and readId() methods to serialize/deserialize only identifier part of Block to be used in DataTransferProtocol.

          Show
          Suresh Srinivas added a comment - Thanks for the review. Attached is the trunk version of the patch. A block has two pieces of information. Block identifier: this include Block ID and Generation Stamp. Block attribute/information: len I have introduced writeId() and readId() methods to serialize/deserialize only identifier part of Block to be used in DataTransferProtocol.
          Hide
          Suresh Srinivas added a comment -

          I created a separate jira HDFS-1407 for this. Please review the patch for the trunk on that jira. Once it is committed, I will attach a separate patch for federation branch.

          Show
          Suresh Srinivas added a comment - I created a separate jira HDFS-1407 for this. Please review the patch for the trunk on that jira. Once it is committed, I will attach a separate patch for federation branch.
          Hide
          Suresh Srinivas added a comment -

          New patch with a part of the previous change made in HDFS-1407 merged from trunk.

          Show
          Suresh Srinivas added a comment - New patch with a part of the previous change made in HDFS-1407 merged from trunk.
          Hide
          Suresh Srinivas added a comment -

          I ran unit tests. There are some known failures:
          TestStartup
          TestDFSUpgradeFromImage
          TestCheckpoint

          Here is the testpatch result:
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 18 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system tests framework. The patch passed system tests framework compile.

          Show
          Suresh Srinivas added a comment - I ran unit tests. There are some known failures: TestStartup TestDFSUpgradeFromImage TestCheckpoint Here is the testpatch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 18 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile.
          Hide
          Konstantin Shvachko added a comment -

          Looks good. Couple of things:

          1. Did not find any calls to public method BlockReader.getFileName(addr, poolId, blockId). If this is true could you please remove it for now.
          2. Do we need to increment DATA_TRANSFER_VERSION if its methods parameters are changing?
          3. It is better to make descriptions of methods as javadoc comments.

          +1 besides this minor things.

          Show
          Konstantin Shvachko added a comment - Looks good. Couple of things: Did not find any calls to public method BlockReader.getFileName(addr, poolId, blockId). If this is true could you please remove it for now. Do we need to increment DATA_TRANSFER_VERSION if its methods parameters are changing? It is better to make descriptions of methods as javadoc comments. +1 besides this minor things.
          Hide
          Suresh Srinivas added a comment -

          Thanks for the review, Konstantin.
          > Did not find any calls to public method BlockReader.getFileName(addr, poolId, blockId).
          Removed BlockReader.getFileName(addr, blockid) and calls are made only to BlockReader.getFileName(addr, poolId, blockId)

          > Do we need to increment DATA_TRANSFER_VERSION if its methods parameters are changing?
          Done

          > It is better to make descriptions of methods as javadoc comments.
          Added javadoc comments to BlockReader#getFileName()

          Show
          Suresh Srinivas added a comment - Thanks for the review, Konstantin. > Did not find any calls to public method BlockReader.getFileName(addr, poolId, blockId). Removed BlockReader.getFileName(addr, blockid) and calls are made only to BlockReader.getFileName(addr, poolId, blockId) > Do we need to increment DATA_TRANSFER_VERSION if its methods parameters are changing? Done > It is better to make descriptions of methods as javadoc comments. Added javadoc comments to BlockReader#getFileName()
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to Federation branch.

          Show
          Suresh Srinivas added a comment - I committed the patch to Federation branch.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Suresh Srinivas
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development