Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
We recently added a new feature called vectored IO in Hadoop for improving read performance for seek heavy readers. Spark Jobs and others which uses parquet will greatly benefit from this api. Details can be found hereĀ
https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5
Attachments
Issue Links
- is blocked by
-
PARQUET-2277 Bump hadoop.version from 3.2.3 to 3.3.5
- Resolved
- is depended upon by
-
SPARK-44116 Utilize Hadoop vectorized APIs
- Open
- is related to
-
HADOOP-19101 Vectored Read into off-heap buffer broken in fallback implementation
- Resolved
-
HADOOP-19098 Vector IO: consistent specified rejection of overlapping ranges
- Resolved
- is superceded by
-
PARQUET-2486 Improve Parquet IO Performance within cloud datalakes
- In Progress
- relates to
-
HADOOP-18103 High performance vectored read API in Hadoop
- Resolved
- links to