Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3941

Backport HDFS-3498 and HDFS3601: update replica placement policy for new added "NodeGroup" layer topology

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.2.0, 1-win
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      With enabling additional layer of "NodeGroup", the replica placement policy used in BlockPlacementPolicyWithNodeGroup is updated to following rules:
      0. No more than one replica is placed within a NodeGroup
      1. First replica on the local node.
      2. Second and third replicas are within the same rack but remote rack with 1st replica.
      3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.

      Also, this patch abstract Replica Removal Policy from FSNameSystem to BlockPlacementPolicy and update removal policy slightly to remove duplicated replica within the same NodeGroup first when over-replicated happens.

      1. HDFS-3941.002.patch
        56 kB
        Jing Zhao
      2. HDFS-3941.003.patch
        49 kB
        Jing Zhao
      3. HDFS-3941.004.patch
        51 kB
        Jing Zhao
      4. HDFS-3941.005.patch
        53 kB
        Jing Zhao
      5. HDFS-3941.patch
        57 kB
        Junping Du

        Issue Links

          Activity

          Hide
          mattf Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          mattf Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.
          Hide
          djp Junping Du added a comment -

          Per HDFS-4261, both Aaron and Suresh suggested we can reuse the JIRAs in trunk (HDFS-3498 and HDFS-3601). I think we can simply close this JIRA here if you are OK. Thanks.

          Show
          djp Junping Du added a comment - Per HDFS-4261 , both Aaron and Suresh suggested we can reuse the JIRAs in trunk ( HDFS-3498 and HDFS-3601 ). I think we can simply close this JIRA here if you are OK. Thanks.
          Hide
          qwertymaniac Harsh J added a comment -

          Thanks, that would be okay. Can we have a JIRA tracking the backport as well (we can reopen/reuse this)?

          Show
          qwertymaniac Harsh J added a comment - Thanks, that would be okay. Can we have a JIRA tracking the backport as well (we can reopen/reuse this)?
          Hide
          djp Junping Du added a comment -

          Harsh, we reach to agreement to backport it to branch-2 by reusing the JIRA in trunk (some discussion in HDFS-4261). I will start the backport work once I finished YARN part work on trunk (YARN-18, YARN-19). Is this plan make sense to you?

          Show
          djp Junping Du added a comment - Harsh, we reach to agreement to backport it to branch-2 by reusing the JIRA in trunk (some discussion in HDFS-4261 ). I will start the backport work once I finished YARN part work on trunk ( YARN-18 , YARN-19 ). Is this plan make sense to you?
          Hide
          qwertymaniac Harsh J added a comment -

          This change did not go into any branch-2 release (but is in trunk) but has made it to branch-1, which is a feature regression between the two lines. Can it be backported to branch-2 after discussion as well, or removed from branch-1 if disagreed upon?

          Show
          qwertymaniac Harsh J added a comment - This change did not go into any branch-2 release (but is in trunk) but has made it to branch-1, which is a feature regression between the two lines. Can it be backported to branch-2 after discussion as well, or removed from branch-1 if disagreed upon?
          Hide
          djp Junping Du added a comment -

          Thanks Nicholas and Jing!

          Show
          djp Junping Du added a comment - Thanks Nicholas and Jing!
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          I have committed it. Thanks, Junping and Jing!

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - I have committed it. Thanks, Junping and Jing!
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          +1 the 005 patch looks good.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - +1 the 005 patch looks good.
          Hide
          jingzhao Jing Zhao added a comment -

          Updated the patch to make it more compatible with the trunk. Have run unit tests locally and all the testcases passed except TestNNThroughputBenchmark (reported in HDFS-4204).

          Show
          jingzhao Jing Zhao added a comment - Updated the patch to make it more compatible with the trunk. Have run unit tests locally and all the testcases passed except TestNNThroughputBenchmark (reported in HDFS-4204 ).
          Hide
          jingzhao Jing Zhao added a comment -

          Change back the names of the three new methods in BlockPlacementPolicy to be compatible with the current trunk.

          Show
          jingzhao Jing Zhao added a comment - Change back the names of the three new methods in BlockPlacementPolicy to be compatible with the current trunk.
          Hide
          jingzhao Jing Zhao added a comment -

          Junping, thanks for the comments! You're right, my 002 patch updates numOfAvailableNodes. The new 003 patch removes this part, also remove part of the testcases from TestReplicationPolicyWithNodeGroup which are not included in the trunk currently.

          Show
          jingzhao Jing Zhao added a comment - Junping, thanks for the comments! You're right, my 002 patch updates numOfAvailableNodes. The new 003 patch removes this part, also remove part of the testcases from TestReplicationPolicyWithNodeGroup which are not included in the trunk currently.
          Hide
          djp Junping Du added a comment -

          Jing, your patch seems to also include a bug fix which tries to update numOfAvailableNodes after removing same nodegroup nodes as replica placing candidates. A complete fix is patch available on HADOOP-9045. Per Nicholas's suggestion, we may only include code in HDFS-3498 and HDFS-3601?

          Show
          djp Junping Du added a comment - Jing, your patch seems to also include a bug fix which tries to update numOfAvailableNodes after removing same nodegroup nodes as replica placing candidates. A complete fix is patch available on HADOOP-9045 . Per Nicholas's suggestion, we may only include code in HDFS-3498 and HDFS-3601 ?
          Hide
          djp Junping Du added a comment -

          Hi, Jing, I am currently working on this. Due to time different (I am in +8), my response could be with latency. However, thanks for your patch.

          Show
          djp Junping Du added a comment - Hi, Jing, I am currently working on this. Due to time different (I am in +8), my response could be with latency. However, thanks for your patch.
          Hide
          jingzhao Jing Zhao added a comment -

          Will run unit tests and testpatch tonight.

          Show
          jingzhao Jing Zhao added a comment - Will run unit tests and testpatch tonight.
          Hide
          jingzhao Jing Zhao added a comment -

          Based on Junping's original patch, I tried to address Nicholas's comments and generated a new patch which may be more close to the current trunk.

          Junping, if you have not worked on this, could you please help review my patch? Otherwise please skip this patch. Thanks!

          Show
          jingzhao Jing Zhao added a comment - Based on Junping's original patch, I tried to address Nicholas's comments and generated a new patch which may be more close to the current trunk. Junping, if you have not worked on this, could you please help review my patch? Otherwise please skip this patch. Thanks!
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          > ... Can you take a look at it? ...

          Sure, I have commented on HADOOP-9045.

          We usually use a JIRA for a single issue/bug. The patch can be merged/backported to earlier branches within the same JIRA. In this case, let's keep HDFS-3941 for backporting HDFS-3498 and HDFS-3601, and then fix the bug by HADOOP-9045 (so we will commit HADOOP-9045 to both trunk and branch-1). Sounds good?

          BTW, if you are busy on something, Jing can help out here.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - > ... Can you take a look at it? ... Sure, I have commented on HADOOP-9045 . We usually use a JIRA for a single issue/bug. The patch can be merged/backported to earlier branches within the same JIRA. In this case, let's keep HDFS-3941 for backporting HDFS-3498 and HDFS-3601 , and then fix the bug by HADOOP-9045 (so we will commit HADOOP-9045 to both trunk and branch-1). Sounds good? BTW, if you are busy on something, Jing can help out here.
          Hide
          djp Junping Du added a comment -

          Nicholas, I just file a related bug against trunk on HADOOP-9045. Can you take a look at it? Thanks! Hopefully, I can gradually eliminate gap between trunk and branch-1 patch without refactoring. However, I think bug fixing on branch-1 is what we should keep and try to update to trunk. Thoughts?

          Show
          djp Junping Du added a comment - Nicholas, I just file a related bug against trunk on HADOOP-9045 . Can you take a look at it? Thanks! Hopefully, I can gradually eliminate gap between trunk and branch-1 patch without refactoring. However, I think bug fixing on branch-1 is what we should keep and try to update to trunk. Thoughts?
          Hide
          djp Junping Du added a comment -

          Hi Nicholas. Ok. I will file a separated jira to fix the bug in trunk. Thanks for suggestions!

          Show
          djp Junping Du added a comment - Hi Nicholas. Ok. I will file a separated jira to fix the bug in trunk. Thanks for suggestions!
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          Hi Junping, the posted patch is quite different from the ones in HDFS-3498 and HDFS3601. It seems that there is some code refactoring. Could you make the patch look the same as the ones in trunk? For code refactoring, bug fixes or other additional works, let's do it separately so that the changes will also go to trunk.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - Hi Junping, the posted patch is quite different from the ones in HDFS-3498 and HDFS3601. It seems that there is some code refactoring. Could you make the patch look the same as the ones in trunk? For code refactoring, bug fixes or other additional works, let's do it separately so that the changes will also go to trunk.

            People

            • Assignee:
              djp Junping Du
              Reporter:
              djp Junping Du
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development