HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
The balancer is a tool that balances disk space usage on an HDFS cluster when some datanodes become full or when new empty nodes join the cluster. The tool is deployed as an application program that can be run by the cluster administrator on a live HDFS cluster while applications adding and deleting files.
DESCRIPTION
The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.
The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In each iteration a datanode moves or receives no more than the lesser of 10G bytes or the threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each iteration, the balancer obtains updated datanodes information from the namenode.
A system property that limits the balancer's use of bandwidth is defined in the default configuration file:
<property> <name>dfs.balance.bandwidthPerSec</name> <value>1048576</value> <description> Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second. </description> </property>
This property determines the maximum speed at which a block will be moved from one datanode to another. The default value is 1MB/s. The higher the bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes. If an administrator changes the value of this property in the configuration file, the change is observed when HDFS is next restarted.
MONITERING BALANCER PROGRESS
After the balancer is started, an output file name where the balancer progress will be recorded is printed on the screen. The administrator can monitor the running of the balancer by reading the output file. The output shows the balancer's status iteration by iteration. In each iteration it prints the starting time, the iteration number, the total number of bytes that have been moved in the previous iterations, the total number of bytes that are left to move in order for the cluster to be balanced, and the number of bytes that are being moved in this iteration. Normally "Bytes Already Moved" is increasing while "Bytes Left To Move" is decreasing.
Running multiple instances of the balancer in an HDFS cluster is prohibited by the tool.
The balancer automatically exits when any of the following five conditions is satisfied:
Upon exit, a balancer returns an exit code and prints one of the following messages to the output file in corresponding to the above exit reasons:
The administrator can interrupt the execution of the balancer at any time by running the command "stop-balancer.sh" on the machine where the balancer is running.
To start: bin/start-balancer.sh [-threshold] Example: bin/ start-balancer.sh start the balancer with a default threshold of 10% bin/ start-balancer.sh -threshold 5 start the balancer with a threshold of 5% To stop: bin/ stop-balancer.sh
Where, threshold - is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become.
This feature is tool that balances disk space usage on an HDFS cluster when some data nodes become full or when new empty nodes join the cluster. Bug in this feature could cause
All balancer tests are run with a threshold of 10% unless otherwise noted. All nodes have the same capacity unless otherwise noted. Test cases 3, 4, and 5 are automatic while all the other cases are manual. All tests are expected to meet the following requirements unless otherwise noted.
| Id | Type of Test | Description | Expected Behavior | Is Automated |
| Balancer_01 | Positive | Start balancer and check if the cluster is balanced after the run. | Cluster should be in balanced state | No |
| Balancer_02 | Positive | Test a cluster with even distribution, then a new empty node is added to the cluster. | Balancer should automatically start balancing cluster by loading data on empty cluster | No |
| Balancer_03 | Positive | Bring up a one-node dfs cluster. Set files’ replication factor to be 1 and fill up the node to 30% full. Then add an empty data node | Old node is 25% utilized and the new node is 5% utilized. | Yes |
| Balancer_04 | Positive | The same as 03 except that the empty new data node is on a different rack. | The same as 03 | Yes |
| Balancer_05 | Positive | The same as 03 except that the empty new data node is half of the capacity as the old one. | Old one is 25% utilized and the new one is 10% utilized | Yes |
| Balancer_06 | Positive | Bring up a 2-node cluster and fill one node to be 60% and the other to be 10% full. All nodes are on different racks. | One node is 40% utilized and the other one is 30% utilized | No |
| Balancer_07 | Positive | Bring up a dfs cluster with nodes A and B. Set files’ replication factor to be 2 and fill up the cluster to 30% full. Then add an empty data node C. All three nodes are on the same rack. | Old ones are 25% utilized and the new one is 10% | No |
| Balancer_08 | Positive | The same as test case 7 except that A, B, and C are on different racks. | The same as above | No |
| Balancer_09 | Positive | The same as test case 7 except that interrupt rebalancing. | The cluster is less imbalanced | |
| Balancer_10 | Positive | Restart rebalancing until it is done. | The same as 7 | No |
| Balancer_11 | Positive | The same as test case 7 except that shut down namenode while rebalancing | Rebalancing is interrupted | No |
| Balancer_12 | Positive | The same as test case 5 except that writing while rebalancing. | he cluster most likely becomes balanced, but may fluctuate | No |
| Balancer_13 | Positive | The same as test case 5 except that deleting while rebalancing | The same as above | No |
| Balancer_14 | Positive | The same as test case 5 except that writing & deleting while rebalancing | The same as above | No |
| Balancer_15 | Positive | Scalability test: populate a 750-node cluster
|
Cluster becomes balanced; File I/O’s performance should not be noticeable slower. | No |
| Balancer_16 | Positive | Start balancer with negative threshold value. | Command execution error output =
|
No |
| Balancer_17 | Positive | Start balancer with out-of-range threshold value. e.g ( -123, 0, -324 , 100000, -1222222 , 1000000000, -10000 , 345, 989 ) | Exit with error message | No |
| Balancer_18 | Positive | Start balancer with alpha-numeric threshold value (e.g 103dsf , asd234 ,asfd ,ASD , #$asd , 2345& , $35 , %34). | Exit with error message | No |
| Balancer_19 | Positive | Start 2 instances of balancer on the same gateway. | Exit with error message | No |
| Balancer_20 | Positive | Start 2 instances of balancer on two different gateways | Exit with error message | No |
| Balancer_21 | Positive | Start balancer when the cluster is already balanced. | Balancer should print information about all nodes in cluster and exit with status of Cluster is balanced. | No |
| Balancer_22 | Positive | Running the balancer with half the data nodes not running | No | |
| Balancer_23 | Positive | Running the balancer and simultaneously simulating load on the cluster with half the data nodes not running | No |
First set up a 3 node cluster with nodes NA, NB, and NC, which are on different racks. Then create a file with one block B with a replication factor 3. Finally add a new node ND to the cluster on the same rack as NC.
| Id | Type of Test | Description | Expected Behavior | Is Automated |
| ProtocolTest_01 | Positive | Copy block B from ND to NA with del hint NC | Fail because proxy source ND does not have the block | No |
| ProtocolTest_02 | Positive | Copy block B from NA to NB with del hint N | Fail because the destination NB contains the block | No |
| ProtocolTest_02 | Positive | Copy block B from NA to ND with del hint NB | Succeed now block B is on NA, NC, and ND | No |
| ProtocolTest_02 | Positive | Copy block B from NB to NCwith del hint NA | NA Succeed but NA is not a valid del hint. So block B is on NA and NB. The third replica is either on NC or ND | No |
| Id | Type of Test | Description | Expected Behavior | Is Automated |
| ThrottlingTest_01 | Positive | Create a throttler with 1MB/s bandwidth. Send 6MB data, and throttle at 0.5MB, 0.75MB, and in the end. | Actual bandwidth should be less than or equal to the expected bandwidth 1MB/s | No |
Set up a 2-node cluster and create a file with a length of 2 blocks and a replication factor of 2
| Id | Type of Test | Description | Expected Behavior | Is Automated |
| NamenodeProtocolTest_01 | Positive | Get blocks from datanode 0 with a size of 2 blocks. | Actual bandwidth should be less than or equal to the expected bandwidth 1MB/s | No |
| NamenodeProtocolTest_02 | Positive | Get blocks from datanode 0 with a size of 1 block. | Return 1 block | No |
| NamenodeProtocolTest_03 | Positive | Get blocks from datanode 0 with a size of 0 . | Receive an IOException | No |
| NamenodeProtocolTest_04 | Positive | Get blocks from datanode 0 with a size of -1. | Receive an IOException | No |
| NamenodeProtocolTest_05 | Positive | Get blocks from a non-existent datanode. | Receive an IOException | No |