Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17061

EC: Let data blocks and parity blocks on DNs more balanced

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None


      When choosing DN for placing data block or parity block, the existing number of data block and parity block on datanode is not taken into consideration. This may lead to uneven traffic load.

      As shown in the figure 1, when reading block group A, B, C, D and E from five different EC files without any missing block, datanodes like DN1 and DN2 will have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low or even no traffic load.

      If we can let data blocks and parity blocks on DNs more balanced, the traffic load in cluster will be more balanced and the peak traffic load on DN will be reduced. Here "balance" refers to the matching of the number of data blocks and parity blocks on DN with its EC policy. In the ideal state, each DN has a balanced traffic load just like what figure 2 shows.

      Then how to reduce this imbalance? I think it's related to EC policy and the ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's appropriate to let the ratio close to 3:2.

      There are two solutions:
      1.Improve the block placement policy.
      2.Improve the Balancer.



          This comment will be Viewable by All Users Viewable by All Users


            Unassigned Unassigned Assign to me
            wangyuanben WangYuanben




                Issue deployment