Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8945

Update the description about replica placement in HDFS Architecture documentation

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: documentation
    • Labels:
      None

      Description

      The description about replica placement should have

      • Explanation about storage types and storage policies should be added
      • placement policy for replication factor greater than 4
      1. HDFS-8945.001.patch
        2 kB
        Masatake Iwasaki
      2. HDFS-8945.002.patch
        3 kB
        Masatake Iwasaki

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -



        +1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 2m 54s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 release audit 0m 20s The applied patch does not increase the total number of release audit warnings.
        +1 site 3m 0s Site still builds.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            6m 17s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12751993/HDFS-8945.001.patch
        Optional Tests site
        git revision trunk / feaf034
        Java 1.7.0_55
        uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12087/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 2m 54s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 release audit 0m 20s The applied patch does not increase the total number of release audit warnings. +1 site 3m 0s Site still builds. +1 whitespace 0m 0s The patch has no lines that end in whitespace.     6m 17s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751993/HDFS-8945.001.patch Optional Tests site git revision trunk / feaf034 Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12087/console This message was automatically generated.
        Hide
        andrew.wang Andrew Wang added a comment -

        Good idea to update this, thanks for working on it. Few comments, apologies but I did not dust off BlockPlacementPolicyDefault for a re-read (exercise for the author ):

        • For 4+ replicas, since we've already guaranteed multi-rack with the first 3, I thought the 4th+ are just pure random. The wording makes it sound like we do something more, but not exactly what.
        • "was added" rather than "are added"
        • What wins when storage and rack awareness conflict? e.g. I specify ALL_SSD and all the SSD nodes are on the same rack. There are a number of precedence rules in BPP that are not documented like this. If you think documenting this is too confusing, can just delete this bit.
        Show
        andrew.wang Andrew Wang added a comment - Good idea to update this, thanks for working on it. Few comments, apologies but I did not dust off BlockPlacementPolicyDefault for a re-read (exercise for the author ): For 4+ replicas, since we've already guaranteed multi-rack with the first 3, I thought the 4th+ are just pure random. The wording makes it sound like we do something more, but not exactly what. "was added" rather than "are added" What wins when storage and rack awareness conflict? e.g. I specify ALL_SSD and all the SSD nodes are on the same rack. There are a number of precedence rules in BPP that are not documented like this. If you think documenting this is too confusing, can just delete this bit.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks for the comments, Andrew Wang!

        For 4+ replicas, since we've already guaranteed multi-rack with the first 3, I thought the 4th+ are just pure random.

        BlockPlacementPolicyDefault#isGoodDatanode checks that the number of replicas in the same rack is under the limit given by BlockPlacementPolicyDefault#getMaxNodesPerRack (which was added by HDFS-2576).

        int maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2;
        

        The limit avoids that the rest of replicas allocated under the same rack.

        In addition, experiment using the code of TestDefaultBlockPlacementPolicy showed me that setting replication factor to total number of nodes in the cluster does not always result in replicas located on all nodes.

        I changed the number of nodes of mini culster to 9. /RACK0 has 6 nodes, /RACK2 has 2 and /RACK3 has 1.

            final String[] racks = { "/RACK0", "/RACK0", "/RACK2", "/RACK3", "/RACK2", "/RACK0", "/RACK0", "/RACK0", "/RACK0" };
            final String[] hosts = { "/host0", "/host1", "/host2", "/host3", "/host4" ,"/host5", "/host6", "/host7", "/host8" };
        

        When I added the code to create a file with replication factor 9, I always got 7 replicas located as below because maxNodesPerRack is 4 in this case, though this is unusual case in which nodes are not evenly distributed among racks.

        /RACK0
        /RACK0
        /RACK0
        /RACK0
        /RACK2
        /RACK2
        /RACK3
        
        Show
        iwasakims Masatake Iwasaki added a comment - Thanks for the comments, Andrew Wang ! For 4+ replicas, since we've already guaranteed multi-rack with the first 3, I thought the 4th+ are just pure random. BlockPlacementPolicyDefault#isGoodDatanode checks that the number of replicas in the same rack is under the limit given by BlockPlacementPolicyDefault#getMaxNodesPerRack (which was added by HDFS-2576 ). int maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2; The limit avoids that the rest of replicas allocated under the same rack. In addition, experiment using the code of TestDefaultBlockPlacementPolicy showed me that setting replication factor to total number of nodes in the cluster does not always result in replicas located on all nodes. I changed the number of nodes of mini culster to 9. /RACK0 has 6 nodes, /RACK2 has 2 and /RACK3 has 1. final String [] racks = { "/RACK0" , "/RACK0" , "/RACK2" , "/RACK3" , "/RACK2" , "/RACK0" , "/RACK0" , "/RACK0" , "/RACK0" }; final String [] hosts = { "/host0" , "/host1" , "/host2" , "/host3" , "/host4" , "/host5" , "/host6" , "/host7" , "/host8" }; When I added the code to create a file with replication factor 9, I always got 7 replicas located as below because maxNodesPerRack is 4 in this case, though this is unusual case in which nodes are not evenly distributed among racks. /RACK0 /RACK0 /RACK0 /RACK0 /RACK2 /RACK2 /RACK3
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Hmm.. Storage policies wins because rack awareness always fallback to random but storage policy is must.

        BlockPlacementPolicyDefault looks for node based on network topology at first, then checks the chosen node has required storage type. If the node does not satisfy the storage condition, then the next node is selected along with the rule of rack awareness. If enough replicas are not found in the first path, it look for the nodes having fallback storage types defined in the policy in the second path.

        Show
        iwasakims Masatake Iwasaki added a comment - Hmm.. Storage policies wins because rack awareness always fallback to random but storage policy is must. BlockPlacementPolicyDefault looks for node based on network topology at first, then checks the chosen node has required storage type. If the node does not satisfy the storage condition, then the next node is selected along with the rule of rack awareness. If enough replicas are not found in the first path, it look for the nodes having fallback storage types defined in the policy in the second path.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        I'm updating the patch based on the comments. It would be better to add explanation that replica placement policy is pluggable by implementing BlockPlacementPolicy and setting the classname by dfs.block.replicator.classname.

        Show
        iwasakims Masatake Iwasaki added a comment - I'm updating the patch based on the comments. It would be better to add explanation that replica placement policy is pluggable by implementing BlockPlacementPolicy and setting the classname by dfs.block.replicator.classname .
        Hide
        iwasakims Masatake Iwasaki added a comment -

        I attached 002. I did not add explanation about BlockPlacementPolicy because it is private API.

        Show
        iwasakims Masatake Iwasaki added a comment - I attached 002. I did not add explanation about BlockPlacementPolicy because it is private API.
        Hide
        hadoopqa Hadoop QA added a comment -



        +1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 3m 2s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
        +1 site 2m 56s Site still builds.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            6m 23s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12752660/HDFS-8945.002.patch
        Optional Tests site
        git revision trunk / 4cbbfa2
        Java 1.7.0_55
        uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12155/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 3m 2s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. +1 site 2m 56s Site still builds. +1 whitespace 0m 0s The patch has no lines that end in whitespace.     6m 23s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12752660/HDFS-8945.002.patch Optional Tests site git revision trunk / 4cbbfa2 Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12155/console This message was automatically generated.
        Hide
        andrew.wang Andrew Wang added a comment -

        My bad for leaving this for so long, +1 LGTM will commit shortly.

        Show
        andrew.wang Andrew Wang added a comment - My bad for leaving this for so long, +1 LGTM will commit shortly.
        Hide
        andrew.wang Andrew Wang added a comment -

        Committed to trunk and branch-2, thanks for the contribution Masatake Iwasaki!

        Show
        andrew.wang Andrew Wang added a comment - Committed to trunk and branch-2, thanks for the contribution Masatake Iwasaki !
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8711 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8711/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8711 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8711/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #588 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/588/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #588 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/588/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #600 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/600/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #600 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/600/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks for the reviews, Andrew Wang!

        Show
        iwasakims Masatake Iwasaki added a comment - Thanks for the reviews, Andrew Wang !
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Yarn-trunk #1324 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1324/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #1324 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1324/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2478 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2478/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2478 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2478/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2531 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2531/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2531 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2531/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #541 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/541/)
        HDFS-8945. Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9)

        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #541 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/541/ ) HDFS-8945 . Update the description about replica placement in HDFS (wang: rev e8aefdf08bc79a0ad537c1b7a1dc288aabd399b9) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

          People

          • Assignee:
            iwasakims Masatake Iwasaki
            Reporter:
            iwasakims Masatake Iwasaki
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development