Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5238 Improve NN operation throughput
  3. HDFS-7967

Reduce the performance impact of the balancer

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Critical
    • Resolution: Unresolved
    • 2.0.0-alpha
    • None
    • namenode
    • None

    Description

      The balancer needs to query for blocks to move from overly full DNs. The block lookup is extremely inefficient. An iterator of the node's blocks is created from the iterators of its storages' blocks. A random number is chosen corresponding to how many blocks will be skipped via the iterator. Each skip requires costly scanning of triplets.

      The current design also only considers node imbalances while ignoring imbalances within the nodes's storages. A more efficient and intelligent design may eliminate the costly skipping of blocks via round-robin selection of blocks from the storages based on remaining capacity.

      Attachments

        1. HDFS-7967.branch-2.001.patch
          33 kB
          Daryn Sharp
        2. HDFS-7967.branch-2.002.patch
          33 kB
          Daryn Sharp
        3. HDFS-7967.branch-2.8.001.patch
          35 kB
          Daryn Sharp
        4. HDFS-7967.branch-2.8.002.patch
          36 kB
          Daryn Sharp
        5. HDFS-7967.branch-2.8.003.patch
          36 kB
          Daryn Sharp
        6. HDFS-7967.branch-2.8-1.patch
          35 kB
          Daryn Sharp
        7. HDFS-7967.branch-2-1.patch
          33 kB
          Daryn Sharp
        8. HDFS-7967-branch-2.8.patch
          36 kB
          Daryn Sharp
        9. HDFS-7967-branch-2.patch
          33 kB
          Daryn Sharp

        Issue Links

          Activity

            People

              daryn Daryn Sharp
              daryn Daryn Sharp
              Votes:
              1 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated: