Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14478

Optimize NativeAzureFsInputStream for positional reads

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-alpha4
    • fs/azure
    • None

    Description

      Azure's BlobbInputStream internally buffers 4 MB of data irrespective of the data length requested for. This would be beneficial for sequential reads. However, for positional reads (seek to specific location, read x number of bytes, seek back to original location) this may not be beneficial and might even download lot more data which are not used later.

      It would be good to override readFully(long position, byte[] buffer, int offset, int length) for NativeAzureFsInputStream and make use of mark(readLimit) as a hint to Azure's BlobInputStream.

      BlobInputStream reference: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448

      BlobInputStream can consider this as a hint later to determine the amount of data to be read ahead. Changes to BlobInputStream would not be addressed in this JIRA.

      Attachments

        1. HADOOP-14478.001.patch
          3 kB
          Rajesh Balamohan
        2. HADOOP-14478.002.patch
          3 kB
          Rajesh Balamohan
        3. HADOOP-14478.003.patch
          3 kB
          Rajesh Balamohan

        Issue Links

          Activity

            People

              rajesh.balamohan Rajesh Balamohan
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: