[HADOOP-11867] Add a high-performance vectored read API. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.3.5
Component/s: fs, fs/azure, fs/s3, hdfs-client
Labels:
- performance
- pull-request-available

Description

The most significant way to read from a filesystem in an efficient way is to let the FileSystem implementation handle the seek behaviour underneath the API to be the most efficient as possible.

A better approach to the seek problem is to provide a sequence of read locations as part of a single call, while letting the system schedule/plan the reads ahead of time.

This is exceedingly useful for seek-heavy readers on HDFS, since this allows for potentially optimizing away the seek-gaps within the FSDataInputStream implementation.

For seek+read systems with even more latency than locally-attached disks, something like a readFully(long[] offsets, ByteBuffer[] chunks) would take of the seeks internally while reading chunk.remaining() bytes into each chunk (which may be {{slice()}}ed off a bigger buffer).

The base implementation can stub in this as a sequence of seeks + read() into ByteBuffers, without forcing each FS implementation to override this in any way.

Attachments

Issue Links

is depended upon by

HADOOP-15963 Add ABFS support for Async Scatter/Gather IO

Open

HADOOP-15964 Add S3A support for Async Scatter/Gather IO

Resolved

ORC-1251 Use Hadoop Vectored IO

Closed

is fixed by

HADOOP-18315 Fix 3.3 build problems caused by backport of HADOOP-11867.

Open

is related to

HADOOP-18391 Improve VectoredReadUtils#readVectored() for direct buffers

Resolved

HADOOP-16241 S3AInputStream PositionReadable should perform ranged read on dedicated stream

Open

relates to

HADOOP-15229 Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

Resolved

HDFS-3051 A zero-copy ScatterGatherRead api from FSDataInputStream

Open

HADOOP-9565 Add a Blobstore interface to add to blobstore FileSystems

Patch Available

links to

GitHub Pull Request #1830

GitHub Pull Request #3499

GitHub Pull Request #3904

(1 is related to, 3 relates to, 3 links to)

Activity

People

Assignee:: Mukund Thakur

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 47 Start watching this issue

Dates

Created:: 22/Apr/15 17:00

Updated:: 25/Oct/23 20:27

Resolved:: 09/Jan/23 18:35

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

13h