[HADOOP-13203] S3A: Support fadvise "random" mode for high performance readPositioned() reads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: fs/s3
Labels:
None

Target Version/s:

2.8.0
Hadoop Flags:

Reviewed
Release Note:

Hide
S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.

Show
S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.

Description

Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of times the stream was closed with abort() causing the internal http connection to be unusable. This incurs lots of connection establishment cost in some jobs. It would be good to set the correct value for the stream length to avoid connection aborts.

I will post the patch once aws tests passes in my machine.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-13203-branch-2-001.patch
25/May/16 10:13
3 kB
Rajesh Balamohan
HADOOP-13203-branch-2-002.patch
02/Jun/16 12:30
6 kB
Rajesh Balamohan
HADOOP-13203-branch-2-003.patch
06/Jun/16 04:26
6 kB
Rajesh Balamohan
HADOOP-13203-branch-2-004.patch
15/Jun/16 10:21
6 kB
Rajesh Balamohan
stream_stats.tar.gz
15/Jun/16 10:21
716 kB
Rajesh Balamohan
HADOOP-13203-branch-2-005.patch
17/Jun/16 18:28
28 kB
Steve Loughran
HADOOP-13203-branch-2-006.patch
20/Jun/16 19:18
43 kB
Steve Loughran
HADOOP-13203-branch-2-007.patch
20/Jun/16 19:29
42 kB
Steve Loughran
HADOOP-13203-branch-2-008.patch
21/Jun/16 15:57
53 kB
Steve Loughran
HADOOP-13203-branch-2-009.patch
21/Jun/16 17:25
57 kB
Steve Loughran
HADOOP-13203-branch-2-010.patch
21/Jun/16 17:38
57 kB
Steve Loughran

Issue Links

incorporates

HADOOP-13286 add a S3A scale test to do gunzip and linecount

Resolved

is related to

HADOOP-16241 S3AInputStream PositionReadable should perform ranged read on dedicated stream

Open

HDFS-2744 Extend FSDataInputStream to allow fadvise

Open

relates to

HADOOP-13028 add low level counter metrics for S3A; use in read performance tests

Resolved

HADOOP-14965 s3a input stream "normal" fadvise mode to be adaptive

Resolved

Activity

People

Assignee:: Rajesh Balamohan

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 25/May/16 07:33

Updated:: 22/Oct/20 18:51

Resolved:: 22/Jun/16 15:02