[HDFS-6596] Improve InputStream when read spans two blocks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: hdfs-client
Labels:
- BB2015-05-TBR

Target Version/s:
Tags:
DFSInputStream

Description

In the current implementation of DFSInputStream, read(buffer, offset, length) is implemented as following:

int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
if (locatedBlocks.isLastBlockComplete()) {
  realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
}
int result = readBuffer(strategy, off, realLen, corruptedBlockMap);

From the above code, we can conclude that the read will return at most (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the caller must call read() second time to complete the request, and must wait second time to acquire the DFSInputStream lock(read() is synchronized for DFSInputStream). For latency sensitive applications, such as hbase, this will result in latency pain point when they under massive race conditions. So here we propose that we should loop internally in read() to do best effort read.

In the current implementation of pread(read(position, buffer, offset, lenght)), it does loop internally to do best effort read. So we can refactor to support this on normal read.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-6596.3.patch
30/Jun/14 12:48
11 kB
Zesheng Wu
HDFS-6596.3.patch
30/Jun/14 09:36
11 kB
Zesheng Wu
HDFS-6596.2.patch
30/Jun/14 05:29
11 kB
Zesheng Wu
HDFS-6596.2.patch
30/Jun/14 02:02
11 kB
Zesheng Wu
HDFS-6596.2.patch
27/Jun/14 07:14
11 kB
Zesheng Wu
HDFS-6596.1.patch
25/Jun/14 10:18
10 kB
Zesheng Wu

Activity

People

Assignee:: Zesheng Wu

Reporter:: Zesheng Wu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 24/Jun/14 06:45

Updated:: 06/May/15 03:32