Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.2
-
None
-
None
Description
calibrate S3A input stream performance against recent applications/data formats and improve where necessary.
HADOOP-18028 is a key part of this, but there are other issues/opertunities
- we could add machine parsable trace-level logging in FSDataInputStream to collect stats on how stream apis are invoked, so collect data from real apps; analyze
- implement those APIs which some apps use (ByteBufferPositionedReadable), not so much for direct implementation as to get better information from the app as its read plan
- the `normal` mode doesn't switch from sequential on forward seeks. Is that always appropriate?
- choose different buffering options when doing whole file IO vs sequential vs random
Attachments
Issue Links
- depends upon
-
HADOOP-18028 High performance S3A input stream with prefetching & caching
- Open
-
HADOOP-16202 Enhance openFile() for better read performance against object stores
- Resolved
- is depended upon by
-
HADOOP-18477 Über-jira: S3A Hadoop 3.3.9 features
- Open
- relates to
-
HADOOP-17842 S3a parquet reads slow with Spark on Kubernetes (EKS)
- Resolved