Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1458

Improve checkpoint performance by avoiding unnecessary image downloads

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.23.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If secondary namenode could verify that the image it has on its disk is the same as the one in the primary NameNode, it could skip downloading the image from the primary NN, thus completely eliminating the image download overhead.

      1. trunkNoDownloadImage3.patch
        9 kB
        Hairong Kuang
      2. trunkNoDownloadImage2.patch
        9 kB
        Hairong Kuang
      3. trunkNoDownloadImage1.patch
        9 kB
        Hairong Kuang
      4. trunkNoDownloadImage.patch
        6 kB
        Hairong Kuang
      5. checkpoint-checkfsimageissame.patch
        10 kB
        Yilei Lu

        Issue Links

          Activity

          Hairong Kuang created issue -
          Hairong Kuang made changes -
          Field Original Value New Value
          Link This issue relates to HDFS-1435 [ HDFS-1435 ]
          Hide
          Yilei Lu added a comment -

          Check NameNode fsimage and SecondaryNameNode when download CheckpointFiles by RPC, if they are the same, SecondaryNameNode will not download the fsimage from the NameNode.

          Show
          Yilei Lu added a comment - Check NameNode fsimage and SecondaryNameNode when download CheckpointFiles by RPC, if they are the same, SecondaryNameNode will not download the fsimage from the NameNode.
          Yilei Lu made changes -
          Attachment checkpoint-checkfsimageissame.patch [ 12457236 ]
          Hairong Kuang made changes -
          Link This issue is related to HDFS-903 [ HDFS-903 ]
          Hide
          Hairong Kuang added a comment -

          Lu, thanks for the patch. It looks good. Two comments:

          1. Can we avoid computing checksum on the fly? I linked this issue to HDFS-903. It would be nice that it gets computed when the image is saved to disk.
          2. Can checksum be returned with rollEditLog?

          Show
          Hairong Kuang added a comment - Lu, thanks for the patch. It looks good. Two comments: 1. Can we avoid computing checksum on the fly? I linked this issue to HDFS-903 . It would be nice that it gets computed when the image is saved to disk. 2. Can checksum be returned with rollEditLog?
          Hide
          dhruba borthakur added a comment -

          If we are recomputing the checksum everytime at the primary NN, then it this patch does not reduce load on the primary namenode, isn't it?

          Show
          dhruba borthakur added a comment - If we are recomputing the checksum everytime at the primary NN, then it this patch does not reduce load on the primary namenode, isn't it?
          Hide
          Hairong Kuang added a comment -

          It will reduce the network traffic.

          Besides using checksum, another option is to use checkpoint time. But it is not as reliable as checksum.

          Show
          Hairong Kuang added a comment - It will reduce the network traffic. Besides using checksum, another option is to use checkpoint time. But it is not as reliable as checksum.
          Hide
          Hairong Kuang added a comment -

          When discussing this with Dhruba, I realized that the secondary does not need to load the image if the primary and the secondary have the same image. This will give yet another boost on checkpoint performance.

          Show
          Hairong Kuang added a comment - When discussing this with Dhruba, I realized that the secondary does not need to load the image if the primary and the secondary have the same image. This will give yet another boost on checkpoint performance.
          Hairong Kuang made changes -
          Summary Improve checkpoint performance by avoiding uncessary image downloads Improve checkpoint performance by avoiding unnecessary image downloads
          Hide
          Konstantin Shvachko added a comment -

          Just curious,

          • why do we bother improving a deprecated SNN for 0.22, if we have a direct replacement CheckpointNode?
          • or is it intended for FB-0.20?
          • This is already done in BackupNode / CheckpointNode by verifying checkpoint time and the versions.
            You can add the verification of the checksum and its done.

          What do I miss?

          Show
          Konstantin Shvachko added a comment - Just curious, why do we bother improving a deprecated SNN for 0.22, if we have a direct replacement CheckpointNode? or is it intended for FB-0.20? This is already done in BackupNode / CheckpointNode by verifying checkpoint time and the versions. You can add the verification of the checksum and its done. What do I miss?
          Hide
          Hairong Kuang added a comment -

          Cool! I did not realize that BackupNode/CheckpointNode already does this, I will take a look. FB will deploy AvatarNode so does not need this either. But I guess the community is still using Secondary NN. There might be a need of this optimization.

          Show
          Hairong Kuang added a comment - Cool! I did not realize that BackupNode/CheckpointNode already does this, I will take a look. FB will deploy AvatarNode so does not need this either. But I guess the community is still using Secondary NN. There might be a need of this optimization.
          Hide
          Yilei Lu added a comment -

          To Hairong.
          Thanks your good idea.
          1. We can generate a UUID when SNN save fsimage. And we can add new fields for the UUID. If the UUID of the NameNode is equal the UUID of the SNN, we can't download the fsimage of NameNode.
          2. Can checksum be returned with rollEditLog? Yes.

          Show
          Yilei Lu added a comment - To Hairong. Thanks your good idea. 1. We can generate a UUID when SNN save fsimage. And we can add new fields for the UUID. If the UUID of the NameNode is equal the UUID of the SNN, we can't download the fsimage of NameNode. 2. Can checksum be returned with rollEditLog? Yes.
          Hide
          Hairong Kuang added a comment -

          This patch does the proposed optimization. If the image checksum at seondary NameNode is the same as the one at primary Namenode, the secondary neither download the image from the primary nor load the image from disk.

          Show
          Hairong Kuang added a comment - This patch does the proposed optimization. If the image checksum at seondary NameNode is the same as the one at primary Namenode, the secondary neither download the image from the primary nor load the image from disk.
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage.patch [ 12459185 ]
          Hide
          Hairong Kuang added a comment -

          I made this patch to be applicable to the trunk. I also add a unit test to it.

          Show
          Hairong Kuang added a comment - I made this patch to be applicable to the trunk. I also add a unit test to it.
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage1.patch [ 12460295 ]
          Hide
          Hairong Kuang added a comment -

          Note that this patch uses checksum to compare if primary & secondary have the same image or not. Once HDFS-1073 is in, we could switch to use transaction id.

          Show
          Hairong Kuang added a comment - Note that this patch uses checksum to compare if primary & secondary have the same image or not. Once HDFS-1073 is in, we could switch to use transaction id.
          Hide
          dhruba borthakur added a comment -

          +1, code looks good to me.

          Show
          dhruba borthakur added a comment - +1, code looks good to me.
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage1.patch [ 12460340 ]
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage1.patch [ 12460340 ]
          Hide
          Hairong Kuang added a comment -

          This patch made minor change on unit test.

          Show
          Hairong Kuang added a comment - This patch made minor change on unit test.
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage2.patch [ 12460341 ]
          Hide
          Hairong Kuang added a comment -

          fixed a fingbugs warning.

          Show
          Hairong Kuang added a comment - fixed a fingbugs warning.
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage3.patch [ 12460342 ]
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage3.patch [ 12460342 ]
          Hairong Kuang made changes -
          Attachment trunkNoDownloadImage3.patch [ 12460343 ]
          Hide
          Hairong Kuang added a comment -

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to i
          [exec] nclude 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system test framework. The patch passed system test framework compile.
          [exec]

          Show
          Hairong Kuang added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to i [exec] nclude 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec]
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.23.0 [ 12315571 ]
          Fix Version/s 0.22.0 [ 12314241 ]
          Hide
          Hairong Kuang added a comment -

          I ran the tests on my local machine. Those are the failed test
          TestBlockRecovery
          TestBlockTokenwithDFS
          TestDFSClientRetires
          TestStorageRestore
          TestSaveNamespace

          None of them are related to my patch.

          Show
          Hairong Kuang added a comment - I ran the tests on my local machine. Those are the failed test TestBlockRecovery TestBlockTokenwithDFS TestDFSClientRetires TestStorageRestore TestSaveNamespace None of them are related to my patch.
          Hide
          Hairong Kuang added a comment -

          I've just committed this.

          Show
          Hairong Kuang added a comment - I've just committed this.
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Todd Lipcon made changes -
          Link This issue breaks HDFS-1904 [ HDFS-1904 ]
          Hide
          Todd Lipcon added a comment -

          Hey Hairong, Dhruba. Would you mind taking a look at HDFS-1904? It seems that this patch either caused or exposed a problem with how we handle the implicit mkdirs of intermediate path components

          Show
          Todd Lipcon added a comment - Hey Hairong, Dhruba. Would you mind taking a look at HDFS-1904 ? It seems that this patch either caused or exposed a problem with how we handle the implicit mkdirs of intermediate path components
          Hairong Kuang made changes -
          Link This issue breaks HDFS-1627 [ HDFS-1627 ]
          Hide
          Hairong Kuang added a comment -

          Todd, you need the fix to HDFS-1627. We already run this and HDFS-1627, together with image compression for around 2 months on our large cluster. All seem to pretty stable and have improved NN availability/responsiveness a lot.

          Show
          Hairong Kuang added a comment - Todd, you need the fix to HDFS-1627 . We already run this and HDFS-1627 , together with image compression for around 2 months on our large cluster. All seem to pretty stable and have improved NN availability/responsiveness a lot.

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development