Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1312

Re-balance disks within a Datanode

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha1
    • datanode
    • None
    • Reviewed
    • Hide
      The Disk Balancer lets administrators rebalance data across multiple disks of a DataNode. It is useful to correct skewed data distribution often seen after adding or replacing disks. Disk Balancer can be enabled by setting dfs.disk.balancer.enabled to true in hdfs-site.xml. It can be invoked by running "hdfs diskbalancer". See the "HDFS Diskbalancer" section in the HDFS Commands guide for detailed usage.
      Show
      The Disk Balancer lets administrators rebalance data across multiple disks of a DataNode. It is useful to correct skewed data distribution often seen after adding or replacing disks. Disk Balancer can be enabled by setting dfs.disk.balancer.enabled to true in hdfs-site.xml. It can be invoked by running "hdfs diskbalancer". See the "HDFS Diskbalancer" section in the HDFS Commands guide for detailed usage.

    Description

      Filing this issue in response to ``full disk woes`` on hdfs-user.

      Datanodes fill their storage directories unevenly, leading to situations where certain disks are full while others are significantly less used. Users at many different sites have experienced this issue, and HDFS administrators are taking steps like:

      • Manually rebalancing blocks in storage directories
      • Decomissioning nodes & later readding them

      There's a tradeoff between making use of all available spindles, and filling disks at the sameish rate. Possible solutions include:

      • Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy environments this will still make use of all spindles, equalizing disk use over time.
      • Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced in older cluster nodes.

      Datanodes should actively manage their local disk so operator intervention is not needed.

      Attachments

        1. disk-balancer-proposal.pdf
          328 kB
          Anu Engineer
        2. Architecture_and_testplan.pdf
          218 kB
          Anu Engineer
        3. Architecture_and_test_update.pdf
          273 kB
          Anu Engineer
        4. HDFS-1312.001.patch
          739 kB
          Anu Engineer
        5. HDFS-1312.002.patch
          739 kB
          Anu Engineer
        6. HDFS-1312.003.patch
          738 kB
          Arpit Agarwal
        7. HDFS-1312.004.patch
          743 kB
          Anu Engineer
        8. HDFS-1312.005.patch
          743 kB
          Arpit Agarwal
        9. HDFS-1312.006.patch
          743 kB
          Anu Engineer
        10. HDFS-1312.007.patch
          744 kB
          Arpit Agarwal

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              aengineer Anu Engineer
              traviscrawford Travis Crawford
              Votes:
              27 Vote for this issue
              Watchers:
              107 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: