Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16614

Improve balancer operation strategy and performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.9.2
    • None
    • balancer & mover, namenode
    • None

    Description

      When the Balancer program is run, it does some work in the following order:
      1. Obtain available datanode information from NameNode.
      2. Classify and calculate the average utilization according to StorageType. Here, some sets will be obtained in combination with the set thresholds: overUtilized, aboveAvgUtilized, belowAvgUtilized, and underUtilized.
      3. According to some calculations, the source and target related to the transfer data are obtained. The source is used for the source end, and the target is used for the data receiving end.
      4. Start the data transfer work in parallel.
      In this process, run iteratively. In this process, the threshold is unified and applied to all StorageTypes, which seems to be a bit rough, because one of the StorageTypes cannot be distinguished, which is based on the currently supported heterogeneous storage.

      There is an online cluster with more than 2000 nodes, and there is an imbalance in node storage. E.g:

      Here, the average utilization of the cluster is 78%, but the utilization of most nodes is between 85% and 90%. When the balancer is turned on, we find that 85% of the nodes are working as sources. In this case, we think it is not reasonable, because it will occupy more network resources in the cluster, and it will be beneficial to the normal work of the cluster to do some effective restrictions.
      So here are some changes to make:
      1. When the balancer is running, we should actively prompt the suggested value of the threshold related to StorageType. For example: [[DISK, 10%], [SSD, 8%]...]
      2. Support to set threshold according to StorageType and work.
      3. Add an option to prohibit nodes below the threshold from joining the Source set. This is to allow nodes with high utilization to transfer data as soon as possible, which is good for balance.
      4. Add new support. If there are a lot of datanode usage in the cluster, it should remain unchanged. For example, the utilization rate of 40% of the nodes in the cluster is 75% to 80%, and these nodes should not join the Source set. Of course this support needs to be specified by the user at runtime.

      Attachments

        Activity

          People

            jianghuazhu JiangHua Zhu
            jianghuazhu JiangHua Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: