Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3941

Backport HDFS-3498 and HDFS3601: update replica placement policy for new added "NodeGroup" layer topology

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.2.0, 1-win
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      With enabling additional layer of "NodeGroup", the replica placement policy used in BlockPlacementPolicyWithNodeGroup is updated to following rules:
      0. No more than one replica is placed within a NodeGroup
      1. First replica on the local node.
      2. Second and third replicas are within the same rack but remote rack with 1st replica.
      3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.

      Also, this patch abstract Replica Removal Policy from FSNameSystem to BlockPlacementPolicy and update removal policy slightly to remove duplicated replica within the same NodeGroup first when over-replicated happens.

      1. HDFS-3941.005.patch
        53 kB
        Jing Zhao
      2. HDFS-3941.004.patch
        51 kB
        Jing Zhao
      3. HDFS-3941.003.patch
        49 kB
        Jing Zhao
      4. HDFS-3941.002.patch
        56 kB
        Jing Zhao
      5. HDFS-3941.patch
        57 kB
        Junping Du

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Junping, the posted patch is quite different from the ones in HDFS-3498 and HDFS3601. It seems that there is some code refactoring. Could you make the patch look the same as the ones in trunk? For code refactoring, bug fixes or other additional works, let's do it separately so that the changes will also go to trunk.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Junping, the posted patch is quite different from the ones in HDFS-3498 and HDFS3601. It seems that there is some code refactoring. Could you make the patch look the same as the ones in trunk? For code refactoring, bug fixes or other additional works, let's do it separately so that the changes will also go to trunk.
          Hide
          Junping Du added a comment -

          Hi Nicholas. Ok. I will file a separated jira to fix the bug in trunk. Thanks for suggestions!

          Show
          Junping Du added a comment - Hi Nicholas. Ok. I will file a separated jira to fix the bug in trunk. Thanks for suggestions!
          Hide
          Junping Du added a comment -

          Nicholas, I just file a related bug against trunk on HADOOP-9045. Can you take a look at it? Thanks! Hopefully, I can gradually eliminate gap between trunk and branch-1 patch without refactoring. However, I think bug fixing on branch-1 is what we should keep and try to update to trunk. Thoughts?

          Show
          Junping Du added a comment - Nicholas, I just file a related bug against trunk on HADOOP-9045 . Can you take a look at it? Thanks! Hopefully, I can gradually eliminate gap between trunk and branch-1 patch without refactoring. However, I think bug fixing on branch-1 is what we should keep and try to update to trunk. Thoughts?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... Can you take a look at it? ...

          Sure, I have commented on HADOOP-9045.

          We usually use a JIRA for a single issue/bug. The patch can be merged/backported to earlier branches within the same JIRA. In this case, let's keep HDFS-3941 for backporting HDFS-3498 and HDFS-3601, and then fix the bug by HADOOP-9045 (so we will commit HADOOP-9045 to both trunk and branch-1). Sounds good?

          BTW, if you are busy on something, Jing can help out here.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... Can you take a look at it? ... Sure, I have commented on HADOOP-9045 . We usually use a JIRA for a single issue/bug. The patch can be merged/backported to earlier branches within the same JIRA. In this case, let's keep HDFS-3941 for backporting HDFS-3498 and HDFS-3601 , and then fix the bug by HADOOP-9045 (so we will commit HADOOP-9045 to both trunk and branch-1). Sounds good? BTW, if you are busy on something, Jing can help out here.
          Hide
          Jing Zhao added a comment -

          Based on Junping's original patch, I tried to address Nicholas's comments and generated a new patch which may be more close to the current trunk.

          Junping, if you have not worked on this, could you please help review my patch? Otherwise please skip this patch. Thanks!

          Show
          Jing Zhao added a comment - Based on Junping's original patch, I tried to address Nicholas's comments and generated a new patch which may be more close to the current trunk. Junping, if you have not worked on this, could you please help review my patch? Otherwise please skip this patch. Thanks!
          Hide
          Jing Zhao added a comment -

          Will run unit tests and testpatch tonight.

          Show
          Jing Zhao added a comment - Will run unit tests and testpatch tonight.
          Hide
          Junping Du added a comment -

          Hi, Jing, I am currently working on this. Due to time different (I am in +8), my response could be with latency. However, thanks for your patch.

          Show
          Junping Du added a comment - Hi, Jing, I am currently working on this. Due to time different (I am in +8), my response could be with latency. However, thanks for your patch.
          Hide
          Junping Du added a comment -

          Jing, your patch seems to also include a bug fix which tries to update numOfAvailableNodes after removing same nodegroup nodes as replica placing candidates. A complete fix is patch available on HADOOP-9045. Per Nicholas's suggestion, we may only include code in HDFS-3498 and HDFS-3601?

          Show
          Junping Du added a comment - Jing, your patch seems to also include a bug fix which tries to update numOfAvailableNodes after removing same nodegroup nodes as replica placing candidates. A complete fix is patch available on HADOOP-9045 . Per Nicholas's suggestion, we may only include code in HDFS-3498 and HDFS-3601 ?
          Hide
          Jing Zhao added a comment -

          Junping, thanks for the comments! You're right, my 002 patch updates numOfAvailableNodes. The new 003 patch removes this part, also remove part of the testcases from TestReplicationPolicyWithNodeGroup which are not included in the trunk currently.

          Show
          Jing Zhao added a comment - Junping, thanks for the comments! You're right, my 002 patch updates numOfAvailableNodes. The new 003 patch removes this part, also remove part of the testcases from TestReplicationPolicyWithNodeGroup which are not included in the trunk currently.
          Hide
          Jing Zhao added a comment -

          Change back the names of the three new methods in BlockPlacementPolicy to be compatible with the current trunk.

          Show
          Jing Zhao added a comment - Change back the names of the three new methods in BlockPlacementPolicy to be compatible with the current trunk.
          Hide
          Jing Zhao added a comment -

          Updated the patch to make it more compatible with the trunk. Have run unit tests locally and all the testcases passed except TestNNThroughputBenchmark (reported in HDFS-4204).

          Show
          Jing Zhao added a comment - Updated the patch to make it more compatible with the trunk. Have run unit tests locally and all the testcases passed except TestNNThroughputBenchmark (reported in HDFS-4204 ).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 the 005 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 the 005 patch looks good.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed it. Thanks, Junping and Jing!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed it. Thanks, Junping and Jing!
          Hide
          Junping Du added a comment -

          Thanks Nicholas and Jing!

          Show
          Junping Du added a comment - Thanks Nicholas and Jing!
          Hide
          Harsh J added a comment -

          This change did not go into any branch-2 release (but is in trunk) but has made it to branch-1, which is a feature regression between the two lines. Can it be backported to branch-2 after discussion as well, or removed from branch-1 if disagreed upon?

          Show
          Harsh J added a comment - This change did not go into any branch-2 release (but is in trunk) but has made it to branch-1, which is a feature regression between the two lines. Can it be backported to branch-2 after discussion as well, or removed from branch-1 if disagreed upon?
          Hide
          Junping Du added a comment -

          Harsh, we reach to agreement to backport it to branch-2 by reusing the JIRA in trunk (some discussion in HDFS-4261). I will start the backport work once I finished YARN part work on trunk (YARN-18, YARN-19). Is this plan make sense to you?

          Show
          Junping Du added a comment - Harsh, we reach to agreement to backport it to branch-2 by reusing the JIRA in trunk (some discussion in HDFS-4261 ). I will start the backport work once I finished YARN part work on trunk ( YARN-18 , YARN-19 ). Is this plan make sense to you?
          Hide
          Harsh J added a comment -

          Thanks, that would be okay. Can we have a JIRA tracking the backport as well (we can reopen/reuse this)?

          Show
          Harsh J added a comment - Thanks, that would be okay. Can we have a JIRA tracking the backport as well (we can reopen/reuse this)?
          Hide
          Junping Du added a comment -

          Per HDFS-4261, both Aaron and Suresh suggested we can reuse the JIRAs in trunk (HDFS-3498 and HDFS-3601). I think we can simply close this JIRA here if you are OK. Thanks.

          Show
          Junping Du added a comment - Per HDFS-4261 , both Aaron and Suresh suggested we can reuse the JIRAs in trunk ( HDFS-3498 and HDFS-3601 ). I think we can simply close this JIRA here if you are OK. Thanks.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

            People

            • Assignee:
              Junping Du
              Reporter:
              Junping Du
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development