[HDFS-4817] make HDFS advisory caching configurable on a per-file basis - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0-alpha1
Fix Version/s: 2.2.0
Component/s: hdfs-client
Labels:
None

Target Version/s:

Description

~~HADOOP-7753~~ and related JIRAs introduced some performance optimizations for the DataNode. One of them was readahead. When readahead is enabled, the DataNode starts reading the next bytes it thinks it will need in the block file, before the client requests them. This helps hide the latency of rotational media and send larger reads down to the device. Another optimization was "drop-behind." Using this optimization, we could remove files from the Linux page cache after they were no longer needed.

Using dfs.datanode.drop.cache.behind.writes and dfs.datanode.drop.cache.behind.reads can improve performance substantially on many MapReduce jobs. In our internal benchmarks, we have seen speedups of 40% on certain workloads. The reason is because if we know the block data will not be read again any time soon, keeping it out of memory allows more memory to be used by the other processes on the system. See HADOOP-7714 for more benchmarks.

We would like to turn on these configurations on a per-file or per-client basis, rather than on the DataNode as a whole. This will allow more users to actually make use of them. It would also be good to add unit tests for the drop-cache code path, to ensure that it is functioning as we expect.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-4817.001.patch
11/May/13 00:37
80 kB
Colin McCabe
HDFS-4817.002.patch
21/May/13 01:44
85 kB
Colin McCabe
HDFS-4817.004.patch
23/May/13 21:09
87 kB
Colin McCabe
HDFS-4817.006.patch
08/Jul/13 23:30
88 kB
Colin McCabe
HDFS-4817.007.patch
09/Jul/13 01:02
86 kB
Colin McCabe
HDFS-4817.008.patch
12/Jul/13 01:14
88 kB
Colin McCabe
HDFS-4817.009.patch
16/Jul/13 19:14
88 kB
Colin McCabe
HDFS-4817.010.patch
19/Jul/13 07:50
90 kB
Colin McCabe
HDFS-4817-b2.1.001.patch
24/Sep/13 21:15
91 kB
Colin McCabe

Issue Links

is depended upon by

HBASE-14098 Allow dropping caches behind compactions

Closed

is duplicated by

HDFS-4184 Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate

Resolved

is related to

HDFS-4966 implement advisory caching for RawLocalFilesystem

Open

relates to

HBASE-10052 use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)

Closed

Activity

People

Assignee:: Colin McCabe

Reporter:: Colin McCabe

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 11/May/13 00:36

Updated:: 12/May/16 18:13

Resolved:: 23/Jul/13 17:55