[HBASE-2180] Bad random read performance from synchronizing hfile.fddatainputstream - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.20.4
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

deep in the HFile read path, there is this code:

synchronized (in)

{ in.seek(pos); ret = in.read(b, off, n); }

this makes it so that only 1 read per file per thread is active. this prevents the OS and hardware from being able to do IO scheduling by optimizing lots of concurrent reads.

We need to either use a reentrant API (pread may be partially reentrant according to Todd) or use multiple stream objects, 1 per scanner/thread.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

2180-v2.patch
05/Feb/10 07:42
23 kB
Michael Stack
2180.patch
05/Feb/10 00:29
12 kB
Michael Stack

Issue Links

blocks

HBASE-1505 [performance] hfile should change how it reads from hdfs -- pread/seek+read -- dependent on recent history

Closed

Activity

People

Assignee:: Michael Stack

Reporter:: ryan rawson

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 02/Feb/10 22:11

Updated:: 12/Oct/12 06:14

Resolved:: 26/Apr/10 16:17