Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.4.0, 3.3.6
-
None
Description
parquet, avro etc are still stuck building with older hadoop releases.
This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 years old such as HADOOP-15229 just aren't picked up.
This lack of openFIle() adoption hurts working with files in cloud storage as
- extra HEAD requests are made
- read policies can't be explicitly set
- split start/end can't be passed down
HADOOP-18679 added a new WrappedIO class.
This jira proposes extending this with
- more of the filesystem/input stream methods
- iOStatistics
- Pull in parquet DynMethods to dynamially wrap and invoke through tests. This class, DynamicWrappedIO is intended to be copied into libraries (parquet, iceberg) for their own use.
- existing tests to use the dynamic binding for end-to-end testing.
+then get into the downstream libraries and use where appropriate
Attachments
Issue Links
- depends upon
-
HADOOP-18679 Add API for bulk/paged delete of files and objects
-
- Resolved
-
- is depended upon by
-
PARQUET-2493 HadoopInputFile to pass down FileStatus when opening file.
-
- Open
-
-
PARQUET-2486 Improve Parquet IO Performance within cloud datalakes
-
- In Progress
-
- is related to
-
HADOOP-19199 Include FileStatus when opening a file from FileSystem
-
- Resolved
-
- links to