[PARQUET-2171] Implement vectored IO in parquet file format - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.14.0
Component/s: parquet-mr
Labels:
None

Description

We recently added a new feature called vectored IO in Hadoop for improving read performance for seek heavy readers. Spark Jobs and others which uses parquet will greatly benefit from this api. Details can be found here

https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5

https://issues.apache.org/jira/browse/HADOOP-18103

https://issues.apache.org/jira/browse/HADOOP-11867

Attachments

Issue Links

is blocked by

PARQUET-2277 Bump hadoop.version from 3.2.3 to 3.3.5

Resolved

is depended upon by

SPARK-44116 Utilize Hadoop vectorized APIs

Open

is related to

HADOOP-19101 Vectored Read into off-heap buffer broken in fallback implementation

Resolved

HADOOP-19098 Vector IO: consistent specified rejection of overlapping ranges

Resolved

is superceded by

PARQUET-2486 Improve Parquet IO Performance within cloud datalakes

In Progress

relates to

HADOOP-18103 High performance vectored read API in Hadoop

Resolved

links to

GitHub Pull Request #1103

GitHub Pull Request #1139

GitHub Pull Request #1330

(1 relates to, 3 links to)

Activity

People

Assignee:: Steve Loughran

Reporter:: Mukund Thakur

Votes:: 1 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 10/Aug/22 22:43

Updated:: 23/Jun/24 03:32

Resolved:: 26/Apr/24 01:46