Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Azure's BlobbInputStream internally buffers 4 MB of data irrespective of the data length requested for. This would be beneficial for sequential reads. However, for positional reads (seek to specific location, read x number of bytes, seek back to original location) this may not be beneficial and might even download lot more data which are not used later.
It would be good to override readFully(long position, byte[] buffer, int offset, int length) for NativeAzureFsInputStream and make use of mark(readLimit) as a hint to Azure's BlobInputStream.
BlobInputStream reference: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
BlobInputStream can consider this as a hint later to determine the amount of data to be read ahead. Changes to BlobInputStream would not be addressed in this JIRA.
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-14500 Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
- Resolved
- contains
-
HADOOP-14490 Upgrade azure-storage sdk version >5.4.0
- Resolved
- is depended upon by
-
HADOOP-14552 Über-jira: WASB client phase II: performance and testing
- Resolved
- is related to
-
HADOOP-14473 Optimize NativeAzureFileSystem::seek for forward seeks
- Closed
-
HADOOP-14552 Über-jira: WASB client phase II: performance and testing
- Resolved
- relates to
-
HADOOP-16317 ABFS: improve random read performance
- Open