[HDFS-1140] Speedup INode.getPathComponents - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.22.0
Fix Version/s: 0.22.0
Component/s: namenode
Labels:
None

Hadoop Flags:

Reviewed

Description

When the namenode is loading the image there is a significant amount of time being spent in the DFSUtil.string2Bytes. We have a very specific workload here. The path that namenode does getPathComponents for shares N - 1 component with the previous path this method was called for (assuming current path has N components).
Hence we can improve the image load time by caching the result of previous conversion.
We thought of using some simple LRU cache for components, but the reality is, String.getBytes gets optimized during runtime and LRU cache doesn't perform as well, however using just the latest path components and their translation to bytes in two arrays gives quite a performance boost.
I could get another 20% off of the time to load the image on our cluster (30 seconds vs 24) and I wrote a simple benchmark that tests performance with and without caching.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-1140.2.patch
14/May/10 02:37
13 kB
Dmytro Molkov
HDFS-1140.3.patch
18/Jun/10 23:58
13 kB
Dmytro Molkov
HDFS-1140.4.patch
29/Jun/10 00:03
13 kB
Dmytro Molkov
HDFS-1140.patch
10/May/10 22:05
4 kB
Dmytro Molkov

Activity

People

Assignee:: Dmytro Molkov

Reporter:: Dmytro Molkov

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 10/May/10 22:01

Updated:: 08/Jul/10 23:34

Resolved:: 08/Jul/10 22:37