[HADOOP-1912] Datanode should support block replacement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.14.1
Fix Version/s: 0.16.0
Component/s: None
Labels:
None

Description

This jira Data Node's support for rebalancing (~~HADOOP-1652~~). When a balancer decides to move a block B from Source S to Destination D. It also chooses a proxy source PS, which contains a replica of B, to speed up block copy. The block placement is carried in the following steps:
1. A block copy command is sent to datanode PS in the format of "OP_BLOCK_COPY <block_id_of_B> <source S> <destination D>". It requests PS to copy B to datanode D.
2. PS then transfers block B to datanode D with a block replacement command to D in the format of "OP_BLOCK_REPLACEMENT <block_id_of_B> <source S> <data_of_B>".
3. Datanode D writes the block B to its disk and then sends a name node a blockReceived RPC informing the namenode that a block B is received and please delete a replica of B from source S if there is any excessive replica.
4. The namenode then adds datanode D to block B's map and removes an exesive replicas of B in favor of datanode S.

In addition, each data node has a limited bandwidth for rebalancing. The default value for the bandwidth is 5MB/s. Throttling is done at both source & destination sides. Each data node limits maximum number of concurrent data transfers (including both sending and receiving) for the rebalancing purpose to be 5. In the worst case, each data transfer has a limited bandwidth of 1MB/s. Each sender & receiver has a Throttler. The primary method of the class is "throttle( int numOfBytes )". The parameter numOfBytes indicates the total number of bytes that the caller has sent or received since the last throttle is called. The method calculates the caller's I/O rate. If the rate is faster than the bandwidth limit, it sleeps to slow down the data transfer. After it wakes up, it adjusts its bandwidth limit if the number of concurrent data transfers is changed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

replace.patch
17/Sep/07 23:16
66 kB
Hairong Kuang
replace1.patch
27/Sep/07 22:00
28 kB
Hairong Kuang
replace2.patch
16/Oct/07 22:30
35 kB
Hairong Kuang
replace3.patch
19/Oct/07 19:08
37 kB
Hairong Kuang
replace4.patch
23/Oct/07 04:36
38 kB
Hairong Kuang
replace5.patch
23/Oct/07 23:13
38 kB
Hairong Kuang
replace6.patch
25/Oct/07 18:01
38 kB
Hairong Kuang

Issue Links

depends upon

HADOOP-1908 Restructure data node code so that block sending/receiving is seperated from data transfer header handling

Closed

HADOOP-2058 Allow adding additional datanodes to MiniDFSCluster

Closed

is depended upon by

HADOOP-1652 Rebalance data blocks when new data nodes added or data nodes become full

Closed

HADOOP-2012 Periodic verification at the Datanode

Closed

Activity

People

Assignee:: Hairong Kuang

Reporter:: Hairong Kuang

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 17/Sep/07 22:18

Updated:: 02/May/13 02:29

Resolved:: 01/Nov/07 18:09