[HADOOP-14965] s3a input stream "normal" fadvise mode to be adaptive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.8.1
Fix Version/s: 3.1.0, 3.0.1
Component/s: fs/s3
Labels:
None

Target Version/s:

3.1.0, 2.10.0, 3.0.1

Description

~~HADOOP-14535~~ added seek optimisation to wasb, but rather than require the caller to declare sequential vs random, it works out for itself.

defaults to sequential, lazy seek
if the caller ever seeks backwards, switches to random IO.

This means that on the use pattern of columnar stores: of go to end of file, read summary, then go to columns and work forwards, will switch to random IO after that first seek back (cost: one aborted HTTP connection)/.

Where this should benefit the most is in downstream apps where you are working with different data sources in the same object store/running of the same app config, but have different read patterns. I'm seeing exactly this in some of my spark tests, where it's near impossible to set things up so that .gz files are read sequentially, but ORC data is read in random IO

I propose the "normal" fadvise => adaptive, sequential==sequential always, random => random from the outset.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-14965-001.patch
25/Oct/17 12:00
8 kB
Steve Loughran
HADOOP-14965-002.patch
26/Oct/17 15:18
7 kB
Steve Loughran
HADOOP-14965-003.patch
23/Nov/17 17:32
7 kB
Steve Loughran
HADOOP-14965-004.patch
12/Dec/17 14:43
7 kB
Steve Loughran

Issue Links

is related to

HADOOP-13203 S3A: Support fadvise "random" mode for high performance readPositioned() reads

Resolved

links to

GitHub Pull Request #283

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 19/Oct/17 09:55

Updated:: 25/Mar/22 18:20

Resolved:: 20/Dec/17 18:29