Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18103

High performance vectored read API in Hadoop

    XMLWordPrintableJSON

Details

    Description

      Add support for multiple ranged vectored read api in PositionedReadable. The default iterates through the ranges to read each synchronously, but the intent is that FSDataInputStream subclasses can make more efficient readers especially object stores implementation.

      Attachments

        1. Vectored Read API for Hadoop FS.pdf
          463 kB
          Mukund Thakur

        Issue Links

          1.
          Add a high-performance vectored read API. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 13h
          2.
          Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 40m
          3.
          Implement a variant of ElasticByteBufferPool which uses weak references for garbage collection. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h
          4.
          Handle memory fragmentation in S3 Vectored IO implementation. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4.5h
          5.
          Vectored IO support for large S3 files. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h
          6.
          Add input stream IOstats for vectored IO api in S3A. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          7.
          Memory fragmentation in ChecksumFileSystem Vectored IO implementation. Sub-task Resolved Unassigned  
          8.
          Restrict vectoredIO threadpool to reduce memory pressure Sub-task Resolved Mukund Thakur  
          9.
          Update previous index properly while validating overlapping ranges. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          10.
          Propagate vectored s3a input stream stats to file system stats. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          11.
          Improve vectored IO api spec. Sub-task Resolved Mukund Thakur  
          12.
          Improve VectoredReadUtils#readVectored() for direct buffers Sub-task Resolved Mukund Thakur  
          13.
          Fix VectoredIO for LocalFileSystem when checksum is enabled. Sub-task Resolved Mukund Thakur  
          14.
          Vectored IO: Threadpool should be closed on interrupts or during close calls Sub-task Open Unassigned  
          15.
          ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing Sub-task Resolved Mukund Thakur  
          16.
          Add an integration test to process data asynchronously during vectored read. Sub-task Resolved Mukund Thakur  
          17.
          VectorIO FileRange type to support a "reference" field Sub-task Resolved Steve Loughran  
          18.
          ChecksumFileSystem::readVectored might return byte buffers not positioned at 0 Sub-task Resolved Harshit Gupta  

          Activity

            People

              mthakur Mukund Thakur
              mthakur Mukund Thakur
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 33h
                  33h