[HADOOP-1187] DFS Scalability: avoid scanning entire list of datanodes in getAdditionalBlocks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

A new block allocations for a file scans the list of all known datanodes to find if the client that is a also a cluster node. If so, then it tries to allocate a replica locally. This check consumes plenty of CPU, especially if the number of datanodes in a cluster is large.

An optimization: if the client is also a cluster node, then cache a reference to the corresponding DatanodeDescriptor from the entry in pendingCreate. The method getAdditionalBlock() uses the cached DatanodeDescriptor and thus avoids scanning the entire list of datanodes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

clientPendingCreate2.patch
03/Apr/07 22:00
4 kB
Dhruba Borthakur

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Dhruba Borthakur

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 30/Mar/07 21:32

Updated:: 08/Jul/09 16:42

Resolved:: 04/Apr/07 19:46