[DRILL-4977] Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.8.0
Fix Version/s: None
Component/s: Storage - Parquet
Labels:
None
Environment:

Hadoop 3.0

Description

When using the new fs.s3a.experimental.input.fadvise=random mode for accessing Parquet files stored in S3, we see a significant improvement for the query performance but a slowdown on query planning. This is due to the way the metadata file is read (each chunk of 8000 bytes generates a new GET request to S3). Indicating with FSDataInputStream.setReadahead(metadata-filesize) that we will read the whole file, this behaviour is circumvented.

Attachments

Issue Links

relates to

DRILL-6540 Upgrade to HADOOP-3.0 libraries

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Uwe Korn

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Oct/16 08:05

Updated:: 22/Feb/19 07:00