Details
Description
parquet, avro etc are still stuck building with older hadoop releases.
This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 years old such as HADOOP-15229 just aren't picked up.
This lack of openFIle() adoption hurts working with files in cloud storage as
- extra HEAD requests are made
- read policies can't be explicitly set
- split start/end can't be passed down
HADOOP-18679 added a new WrappedIO class.
This jira proposes extending this with
- more of the filesystem/input stream methods
- iOStatistics
- Pull in parquet DynMethods to dynamially wrap and invoke through tests. This class, DynamicWrappedIO is intended to be copied into libraries (parquet, iceberg) for their own use.
- existing tests to use the dynamic binding for end-to-end testing.
+then get into the downstream libraries and use where appropriate
Attachments
Issue Links
- causes
-
HADOOP-19285 [ABFS] Restore ETAGS_AVAILABLE to abfs path capabilities
- Resolved
- depends upon
-
HADOOP-18679 Add BulkDelete API for paged delete of files and objects
- Resolved
- is depended upon by
-
PARQUET-2493 HadoopInputFile to pass down FileStatus when opening file.
- Open
-
PARQUET-2486 Improve Parquet IO Performance within cloud datalakes
- In Progress
- is related to
-
HADOOP-19199 Include FileStatus when opening a file from FileSystem
- Resolved
- links to