Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
Description
The most significant way to read from a filesystem in an efficient way is to let the FileSystem implementation handle the seek behaviour underneath the API to be the most efficient as possible.
A better approach to the seek problem is to provide a sequence of read locations as part of a single call, while letting the system schedule/plan the reads ahead of time.
This is exceedingly useful for seek-heavy readers on HDFS, since this allows for potentially optimizing away the seek-gaps within the FSDataInputStream implementation.
For seek+read systems with even more latency than locally-attached disks, something like a readFully(long[] offsets, ByteBuffer[] chunks) would take of the seeks internally while reading chunk.remaining() bytes into each chunk (which may be {{slice()}}ed off a bigger buffer).
The base implementation can stub in this as a sequence of seeks + read() into ByteBuffers, without forcing each FS implementation to override this in any way.
Attachments
Issue Links
- is depended upon by
-
HADOOP-15963 Add ABFS support for Async Scatter/Gather IO
- Open
-
HADOOP-15964 Add S3A support for Async Scatter/Gather IO
- Resolved
-
ORC-1251 Use Hadoop Vectored IO
- Closed
- is fixed by
-
HADOOP-18315 Fix 3.3 build problems caused by backport of HADOOP-11867.
- Open
- is related to
-
HADOOP-18391 Improve VectoredReadUtils#readVectored() for direct buffers
- Resolved
-
HADOOP-16241 S3AInputStream PositionReadable should perform ranged read on dedicated stream
- Open
- relates to
-
HADOOP-15229 Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.
- Resolved
-
HDFS-3051 A zero-copy ScatterGatherRead api from FSDataInputStream
- Open
-
HADOOP-9565 Add a Blobstore interface to add to blobstore FileSystems
- Patch Available
- links to