[IMPALA-8525] preads should use hdfsPreadFully rather than hdfsPread - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 3.4.0
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-1

Description

Impala preads (only enabled if use_hdfs_pread is true) use the hdfsPread API from libhdfs, which ultimately invokes PositionedReadable#read(long position, byte[] buffer, int offset, int length) in the HDFS-client.

PositionedReadable also exposes the method readFully(long position, byte[] buffer, int offset, int length). The difference is that #read will "Read up to the specified number of bytes" whereas #readFully will "Read the specified number of bytes". So there is no guarantee that #read will read all of the request bytes.

Impala calls hdfsPread inside hdfs-file-reader.cc and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:

(1) if the underlying FileSystem does not support ByteBuffer reads (~~HDFS-2834~~) (e.g. S3A does not support this feature) then hdfsPread will allocate a Java array equal in size to specified length of the buffer; the call to PositionedReadable#read may only fill up the buffer partially; Impala will repeat the call to hdfsPread since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations

(2) given that Impala calls hdfsPread in a while loop, there is no point in continuously calling hdfsPread when a single call to hdfsPreadFully will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)

Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to hdfsPreadFully the chunk-size is no longer necessary.

Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by read_size) rather than in smaller chunks (typically 128K). For example, DFSInputStream#read(long position, byte[] buffer, int offset, int length) opens up remote block readers with a byte range determined by the value of length passed into the #read call. Similarly, S3AInputStream#readFully will issue an HTTP GET request with the size of the read specified by the given length (although fadvise must be set to RANDOM for this to work).

This work is dependent on exposing readFully via libhdfs first: ~~HDFS-14564~~

Attachments

Issue Links

is related to

IMPALA-9060 Slow I/O logging is quite verbose on S3

Resolved

HDFS-14564 Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable

Resolved

relates to

IMPALA-9606 ABFS reads should use hdfsPreadFully

Resolved

Activity

People

Assignee:: Sahil Takiar

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/May/19 15:42

Updated:: 02/Oct/20 00:12

Resolved:: 20/Nov/19 18:08