Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-4656

Add a container balancer tool or service for HDDS

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.2.0
    • Ozone Datanode, SCM, Tools
    • None

    Description

      When an existing Ozone cluster is nearly full, we have to add more datanodes into the Ozone cluster, but there are two issue we must face.

      • When new allocate container request coming, SCM should better to choose the datanodes in low usage, if not, the performance will getting pool.
      • For read request, the existing datanodes stored lots of blocks, so they are responsible for serving the read request and supply the data stream service, meanwhile, the new coming datanodes can help nothing.

      If we have a balancer tool just like hdfs balancer, we can move the block or container from some high usage datanodes to low, I think this is one of necessary tools for Ozone.

      container balancer design doc https://docs.google.com/document/d/15PdYaP6aLB18ptbcOK3XWlL4Y1PLfIP-ll30KjNPZ2g/edit?usp=sharing

      Attachments

        1. Container Balancer Design.pdf
          128 kB
          Lokesh Jain
        2. Container Move.pdf
          62 kB
          Lokesh Jain
        3. Ozone Balancer HA.pdf
          651 kB
          Siddhant Sangwan
        4. Replication Manager V2.pdf
          149 kB
          Siddhant Sangwan
        5. Ratis vs EC for Container Balancer.pdf
          50 kB
          Siddhant Sangwan
        1.
        Introduce ContainerBalancer in SCM with start/stop capabilities. Sub-task Resolved Siddhant Sangwan
        2.
        Support start/stop for container balancer via command line Sub-task Resolved Jie Yao
        3.
        Determine over and under utilized datanodes in Container Balancer Sub-task Resolved Siddhant Sangwan
        4.
        Support container move in Replication Manager Sub-task Resolved Jie Yao
        5.
        Select target datanodes and containers to move for Container Balancer Sub-task Resolved Siddhant Sangwan
        6.
        ContainerBalancer should use remaining space to calculate utilization. Sub-task Resolved Siddhant Sangwan
        7.
        Support container move HA Sub-task Resolved Jie Yao
        8.
        Balancer iterations should run on updated utilisation info from datanodes Sub-task Resolved Siddhant Sangwan
        9.
        ContainerBalancer#checkConditionsForBalancing pre-emptively checks iteration limits. Sub-task Resolved Siddhant Sangwan
        10.
        make it configurable to choose the nearest one as the target in the candidates according to networkTopology Sub-task Resolved Jie Yao
        11.
        Stop ContainerBalancer when SCM is stopped. Sub-task Resolved Siddhant Sangwan
        12.
        support Optional<T> as parameters of commandLine Sub-task Resolved Jie Yao
        13.
        ContainerBalancer#stop should prevent the current balancing thread from interrupting itself. Sub-task Resolved Siddhant Sangwan
        14.
        ContainerBalancer should get OzoneConfiguration from ContainerBalancerConfiguration. Sub-task Resolved Siddhant Sangwan
        15.
        support -1 for running balancer infinitely Sub-task Resolved Jie Yao
        16.
        make it configurable to trigger refresh datanode usage info before start a new balance iteration Sub-task Resolved Jie Yao
        17.
        support setting maxSizeEnteringTarget and maxSizeLeavingSource in command line Sub-task Resolved Jie Yao
        18.
        Incorrect calculation of iteration related metrics in ContainerBalancer Sub-task Resolved Siddhant Sangwan
        19.
        add a command to trigger datanode executing "du" immediately Sub-task Resolved Jie Yao
        20.
        Reset default values in ContainerBalancerConfiguration Sub-task Resolved Siddhant Sangwan
        21.
        balancer should stop when the cluster can not be balanced any more Sub-task Resolved Jie Yao
        22.
        ContainerBalancer incorrectly exits an iteration early without checking move results. Sub-task Resolved Siddhant Sangwan
        23.
        Support setting Datanode Reserved Space in MiniOzoneCluster. Sub-task Resolved Siddhant Sangwan
        24.
        Fix ContainerBalancerConfiguration annotation string Sub-task Resolved Siddhant Sangwan
        25.
        Support configuration for including/excluding datanodes for balancing Sub-task Resolved Siddhant Sangwan
        26.
        Improve defaults in ContainerBalancerConfiguration Sub-task Resolved Siddhant Sangwan
        27.
        ContainerBalancerConfig doesn't read config from ozone-site.xml Sub-task Resolved Janus Chow
        28.
        sourceToTargetMap in ContainerBalancer doesn't support multiple entries with same source. Sub-task Resolved Siddhant Sangwan
        29.
        ContainerBalancer metrics don't show updated values in JMX Sub-task Resolved Siddhant Sangwan
        30.
        Support Container Balancer HA Sub-task Resolved Siddhant Sangwan
        31.
        Add aggregate metrics to ContainerBalancerMetrics Sub-task Resolved Siddhant Sangwan
        32.
        ContainerBalancer shows incorrect iteration result sometimes Sub-task Resolved Siddhant Sangwan
        33.
        Implement ContainerBalancer as an SCMService Sub-task Resolved Siddhant Sangwan
        34.
        Add metric for failed container moves Sub-task Resolved Sumit Agrawal
        35.
        Test "size.leaving.source.max" limit in ContainerBalancer Sub-task Resolved Sumit Agrawal
        36.
        Continuous start & stop can have hanging threads in stopping Sub-task Resolved Sumit Agrawal
        37.
        ContainerBalancer need have separate class handling management and balancing activity Sub-task Resolved Sumit Agrawal
        38.
        Improve logging in Container Balancer Sub-task Resolved Siddhant Sangwan

        Activity

          People

            maobaolong Baolong Mao
            maobaolong Baolong Mao
            Votes:
            3 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: