[HADOOP-1306] DFS Scalability: Reduce the number of getAdditionalBlock RPCs on the namenode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

One of the most-frequently-invoked RPCs in the namenode is the addBlock() RPC. The DFSClient uses this RPC to allocate one more block for a file that it is currently operating upon. The scalability of the namenode will improve if we can decrease the number of addBlock() RPCs. One idea that we want to discuss here is to make addBlock() return more than one block. This proposal came out of a discussion I had with Ben Reed.

Let's say that addBlock() returns n blocks for the file. The namenode already tracks these blocks using the pendingCreates data structure. The client guarantees that these n blocks will be used in order. The client also guarantees that if it cannot use a block (dues to whatever reason), it will inform the namenode using the abandonBlock() RPC. These RPCs are already supported.

Another possible optimization : since the namenode has to allocate n blocks for a file, should it use the same set of datanodes for this set of blocks? My proposal is that if n is a small number (e.g. 3), it is prudent to allocate the same set of datanodes to host all replicas for this set of blocks. This will reduce the CPU spent in chooseTargets().

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

fineGrainLocks3.patch
16/May/07 16:25
42 kB
Dhruba Borthakur

Activity

People

Assignee:: Unassigned

Reporter:: Dhruba Borthakur

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 30/Apr/07 18:49

Updated:: 08/Jul/09 16:42

Resolved:: 05/Jul/07 17:56