Description
S3a (and the other object stores) have a lot of IO going on, even for 0 byte files. They don't need to: that's a special case which can be handled locally. A special ZeroByteInputStream class could handle this for all the object stores.
This isn't much of an optimization: code shouldn't normally need to go through 0 byte files, but we see evidence it does sometimes happen.
Attachments
Issue Links
- is related to
-
SPARK-24273 Failure while using .checkpoint method to private S3 store via S3A connector
- Resolved
- relates to
-
HADOOP-13853 S3ADataBlocks.DiskBlock to lazy create dest file for faster 0-byte puts
- Resolved