Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14478

Optimize NativeAzureFsInputStream for positional reads

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: fs/azure
    • Labels:
      None

      Description

      Azure's BlobbInputStream internally buffers 4 MB of data irrespective of the data length requested for. This would be beneficial for sequential reads. However, for positional reads (seek to specific location, read x number of bytes, seek back to original location) this may not be beneficial and might even download lot more data which are not used later.

      It would be good to override readFully(long position, byte[] buffer, int offset, int length) for NativeAzureFsInputStream and make use of mark(readLimit) as a hint to Azure's BlobInputStream.

      BlobInputStream reference: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448

      BlobInputStream can consider this as a hint later to determine the amount of data to be read ahead. Changes to BlobInputStream would not be addressed in this JIRA.

        Attachments

        1. HADOOP-14478.001.patch
          3 kB
          Rajesh Balamohan
        2. HADOOP-14478.002.patch
          3 kB
          Rajesh Balamohan
        3. HADOOP-14478.003.patch
          3 kB
          Rajesh Balamohan

          Issue Links

            Activity

              People

              • Assignee:
                rajesh.balamohan Rajesh Balamohan
                Reporter:
                rajesh.balamohan Rajesh Balamohan
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: