[IMPALA-9606] ABFS reads should use hdfsPreadFully - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 4.0.0
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-11

Description

In ~~IMPALA-8525~~, hdfs preads were enabled by default when reading data from S3. ~~IMPALA-8525~~ deferred enabling preads for ABFS because they didn't significantly improve performance. After some more investigation into the ABFS input streams, I think it is safe to use hdfsPreadFully for ABFS reads.

The ABFS client uses a different model for fetching data compared to S3A. Details are beyond the scope of this JIRA, but it is related to a feature in ABFS called "read-aheads". ABFS has logic to pre-fetch data it thinks will be required by the client. By default, it pre-fetches # cores * 4 MB of data. If the requested data exists in the client cache, it is read from the cache.

However, there is no real drawback to using hdfsPreadFully for ABFS reads. It's definitely safer, because while the current implementation of ABFS always returns the amount of requested data, only the hdfsPreadFully API makes that guarantee.

Attachments

Issue Links

is related to

IMPALA-8525 preads should use hdfsPreadFully rather than hdfsPread

Resolved

Activity

People

Assignee:: Sahil Takiar

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Apr/20 03:41

Updated:: 02/Oct/20 00:13

Resolved:: 02/Oct/20 00:13