[IMPALA-8523] Migrate hdfsOpen to builder-based openFile API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-1

Description

When opening files via libhdfs we call hdfsOpen which ultimately calls FileSystem#open(Path f, int bufferSize). As of ~~HADOOP-15229~~, the HDFS-client now exposes a new API for opening files called openFile. The new API has a few advantages (1) it is capable of specifying file specific configuration values in a builder-based manner (see o.a.h.fs.FSBuilder for details), and (2) it can open files asynchronously (e.g. see o.a.h.fs.FutureDataInputStreamBuilder for details.

The async file opens are similar to ~~IMPALA-7738~~ (Implement timeouts for HDFS open calls). To avoid overlap between ~~IMPALA-7738~~ and the async file opens in openFile, ~~HADOOP-15691~~ can be used to check which filesystems open files asynchronously and which ones don't (currently only S3A opens files asynchronously).

The main use case for the new openFile API is Impala-S3 performance. Performance benchmarks have shown that setting fs.s3a.experimental.input.fadvise to RANDOM for Parquet files can significantly improve performance, however, this setting also adversely affects scans of non-splittable file formats such as gzipped files (see ~~HADOOP-13203~~). One solution to this issue is to just document that setting fs.s3a.experimental.input.fadvise to RANDOM for Parquet improves performance, however, a better solution would be to use the new openFile API to specify different values of fadvise depending on the file type.

This work is dependent on exposing the new openFile API via libhdfs (~~HDFS-14478~~).

Attachments

Issue Links

depends upon

HDFS-14478 Add libhdfs APIs for openFile

Resolved

Activity

People

Assignee:: Sahil Takiar

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/May/19 15:12

Updated:: 08/May/19 15:46