[HDFS-11535] Performance analysis of new DFSNetworkTopology#chooseRandom - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha4
Component/s: namenode
Labels:
None

Hadoop Flags:

Reviewed

Description

This JIRA is created to post the results of some performance experiments we did. For those who are interested, please the attached .pdf file for more detail. The attached patch file includes the experiment code we ran.

The key insights we got from these tests is that: although the new method outperforms the current one in most cases. There is still one case where the current one is better. Which is when there is only one storage type in the cluster, and we also always look for this storage type. In this case, it is simply a waste of time to perform storage-type-based pruning, blindly picking up a random node (current methods) would suffice.

Therefore, based on the analysis, we propose to use a combination of both the old and the new methods:

say, we search for a node of type X, since now inner node all keep storage type info, we can just check root node to see if X is the only type it has. If yes, blindly picking a random leaf will work, so we simply call the old method, otherwise we call the new method.

There is still at least one missing piece in this performance test, which is garbage collection. The new method does a few more object creation when doing the search, which adds overhead to GC. I'm still thinking of any potential optimization but this seems tricky, also I'm not sure whether this optimization worth doing at all. Please feel free to leave any comments/suggestions.

Thanks arpitagarwal and szetszwo for the offline discussion.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-11535.001.patch
15/Mar/17 19:56
16 kB
Chen Liang
HDFS-11535.002.patch
20/Mar/17 10:36
18 kB
Yiqun Lin
HDFS-11535.003.patch
03/Apr/17 22:25
17 kB
Chen Liang
HDFS-11535.004.patch
22/May/17 23:26
19 kB
Chen Liang
PerfTest.pdf
15/Mar/17 19:54
192 kB
Chen Liang

Issue Links

relates to

HDFS-15295 AvailableSpaceBlockPlacementPolicy should use chooseRandomWithStorageTypeTwoTrial() for better performance.

Resolved

Activity

People

Assignee:: Chen Liang

Reporter:: Chen Liang

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 15/Mar/17 19:53

Updated:: 22/Apr/20 15:07

Resolved:: 23/May/17 03:27