Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1458

Improve checkpoint performance by avoiding unnecessary image downloads

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.23.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If secondary namenode could verify that the image it has on its disk is the same as the one in the primary NameNode, it could skip downloading the image from the primary NN, thus completely eliminating the image download overhead.

      1. checkpoint-checkfsimageissame.patch
        10 kB
        Yilei Lu
      2. trunkNoDownloadImage.patch
        6 kB
        Hairong Kuang
      3. trunkNoDownloadImage1.patch
        9 kB
        Hairong Kuang
      4. trunkNoDownloadImage2.patch
        9 kB
        Hairong Kuang
      5. trunkNoDownloadImage3.patch
        9 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Yilei Lu added a comment -

          Check NameNode fsimage and SecondaryNameNode when download CheckpointFiles by RPC, if they are the same, SecondaryNameNode will not download the fsimage from the NameNode.

          Show
          Yilei Lu added a comment - Check NameNode fsimage and SecondaryNameNode when download CheckpointFiles by RPC, if they are the same, SecondaryNameNode will not download the fsimage from the NameNode.
          Hide
          Hairong Kuang added a comment -

          Lu, thanks for the patch. It looks good. Two comments:

          1. Can we avoid computing checksum on the fly? I linked this issue to HDFS-903. It would be nice that it gets computed when the image is saved to disk.
          2. Can checksum be returned with rollEditLog?

          Show
          Hairong Kuang added a comment - Lu, thanks for the patch. It looks good. Two comments: 1. Can we avoid computing checksum on the fly? I linked this issue to HDFS-903 . It would be nice that it gets computed when the image is saved to disk. 2. Can checksum be returned with rollEditLog?
          Hide
          dhruba borthakur added a comment -

          If we are recomputing the checksum everytime at the primary NN, then it this patch does not reduce load on the primary namenode, isn't it?

          Show
          dhruba borthakur added a comment - If we are recomputing the checksum everytime at the primary NN, then it this patch does not reduce load on the primary namenode, isn't it?
          Hide
          Hairong Kuang added a comment -

          It will reduce the network traffic.

          Besides using checksum, another option is to use checkpoint time. But it is not as reliable as checksum.

          Show
          Hairong Kuang added a comment - It will reduce the network traffic. Besides using checksum, another option is to use checkpoint time. But it is not as reliable as checksum.
          Hide
          Hairong Kuang added a comment -

          When discussing this with Dhruba, I realized that the secondary does not need to load the image if the primary and the secondary have the same image. This will give yet another boost on checkpoint performance.

          Show
          Hairong Kuang added a comment - When discussing this with Dhruba, I realized that the secondary does not need to load the image if the primary and the secondary have the same image. This will give yet another boost on checkpoint performance.
          Hide
          Konstantin Shvachko added a comment -

          Just curious,

          • why do we bother improving a deprecated SNN for 0.22, if we have a direct replacement CheckpointNode?
          • or is it intended for FB-0.20?
          • This is already done in BackupNode / CheckpointNode by verifying checkpoint time and the versions.
            You can add the verification of the checksum and its done.

          What do I miss?

          Show
          Konstantin Shvachko added a comment - Just curious, why do we bother improving a deprecated SNN for 0.22, if we have a direct replacement CheckpointNode? or is it intended for FB-0.20? This is already done in BackupNode / CheckpointNode by verifying checkpoint time and the versions. You can add the verification of the checksum and its done. What do I miss?
          Hide
          Hairong Kuang added a comment -

          Cool! I did not realize that BackupNode/CheckpointNode already does this, I will take a look. FB will deploy AvatarNode so does not need this either. But I guess the community is still using Secondary NN. There might be a need of this optimization.

          Show
          Hairong Kuang added a comment - Cool! I did not realize that BackupNode/CheckpointNode already does this, I will take a look. FB will deploy AvatarNode so does not need this either. But I guess the community is still using Secondary NN. There might be a need of this optimization.
          Hide
          Yilei Lu added a comment -

          To Hairong.
          Thanks your good idea.
          1. We can generate a UUID when SNN save fsimage. And we can add new fields for the UUID. If the UUID of the NameNode is equal the UUID of the SNN, we can't download the fsimage of NameNode.
          2. Can checksum be returned with rollEditLog? Yes.

          Show
          Yilei Lu added a comment - To Hairong. Thanks your good idea. 1. We can generate a UUID when SNN save fsimage. And we can add new fields for the UUID. If the UUID of the NameNode is equal the UUID of the SNN, we can't download the fsimage of NameNode. 2. Can checksum be returned with rollEditLog? Yes.
          Hide
          Hairong Kuang added a comment -

          This patch does the proposed optimization. If the image checksum at seondary NameNode is the same as the one at primary Namenode, the secondary neither download the image from the primary nor load the image from disk.

          Show
          Hairong Kuang added a comment - This patch does the proposed optimization. If the image checksum at seondary NameNode is the same as the one at primary Namenode, the secondary neither download the image from the primary nor load the image from disk.
          Hide
          Hairong Kuang added a comment -

          I made this patch to be applicable to the trunk. I also add a unit test to it.

          Show
          Hairong Kuang added a comment - I made this patch to be applicable to the trunk. I also add a unit test to it.
          Hide
          Hairong Kuang added a comment -

          Note that this patch uses checksum to compare if primary & secondary have the same image or not. Once HDFS-1073 is in, we could switch to use transaction id.

          Show
          Hairong Kuang added a comment - Note that this patch uses checksum to compare if primary & secondary have the same image or not. Once HDFS-1073 is in, we could switch to use transaction id.
          Hide
          dhruba borthakur added a comment -

          +1, code looks good to me.

          Show
          dhruba borthakur added a comment - +1, code looks good to me.
          Hide
          Hairong Kuang added a comment -

          This patch made minor change on unit test.

          Show
          Hairong Kuang added a comment - This patch made minor change on unit test.
          Hide
          Hairong Kuang added a comment -

          fixed a fingbugs warning.

          Show
          Hairong Kuang added a comment - fixed a fingbugs warning.
          Hide
          Hairong Kuang added a comment -

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to i
          [exec] nclude 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system test framework. The patch passed system test framework compile.
          [exec]

          Show
          Hairong Kuang added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to i [exec] nclude 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec]
          Hide
          Hairong Kuang added a comment -

          I ran the tests on my local machine. Those are the failed test
          TestBlockRecovery
          TestBlockTokenwithDFS
          TestDFSClientRetires
          TestStorageRestore
          TestSaveNamespace

          None of them are related to my patch.

          Show
          Hairong Kuang added a comment - I ran the tests on my local machine. Those are the failed test TestBlockRecovery TestBlockTokenwithDFS TestDFSClientRetires TestStorageRestore TestSaveNamespace None of them are related to my patch.
          Hide
          Hairong Kuang added a comment -

          I've just committed this.

          Show
          Hairong Kuang added a comment - I've just committed this.
          Hide
          Todd Lipcon added a comment -

          Hey Hairong, Dhruba. Would you mind taking a look at HDFS-1904? It seems that this patch either caused or exposed a problem with how we handle the implicit mkdirs of intermediate path components

          Show
          Todd Lipcon added a comment - Hey Hairong, Dhruba. Would you mind taking a look at HDFS-1904 ? It seems that this patch either caused or exposed a problem with how we handle the implicit mkdirs of intermediate path components
          Hide
          Hairong Kuang added a comment -

          Todd, you need the fix to HDFS-1627. We already run this and HDFS-1627, together with image compression for around 2 months on our large cluster. All seem to pretty stable and have improved NN availability/responsiveness a lot.

          Show
          Hairong Kuang added a comment - Todd, you need the fix to HDFS-1627 . We already run this and HDFS-1627 , together with image compression for around 2 months on our large cluster. All seem to pretty stable and have improved NN availability/responsiveness a lot.

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development