Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18103

High performance vectored read API in Hadoop

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Add support for multiple ranged vectored read api in PositionedReadable. The default iterates through the ranges to read each synchronously, but the intent is that FSDataInputStream subclasses can make more efficient readers especially object stores implementation.

      Attachments

        1. Vectored Read API for Hadoop FS.pdf
          463 kB
          Mukund Thakur

        Issue Links

        1.
        Add a high-performance vectored read API. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 13h
        Actions
        2.
        Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 3h 40m
        Actions
        3.
        Implement a variant of ElasticByteBufferPool which uses weak references for garbage collection. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 4h
        Actions
        4.
        Handle memory fragmentation in S3 Vectored IO implementation. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 4.5h
        Actions
        5.
        Vectored IO support for large S3 files. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 4h
        Actions
        6.
        Add input stream IOstats for vectored IO api in S3A. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        7.
        Memory fragmentation in ChecksumFileSystem Vectored IO implementation. Sub-task Resolved Unassigned   Actions
        8.
        Restrict vectoredIO threadpool to reduce memory pressure Sub-task Resolved Mukund Thakur   Actions
        9.
        Update previous index properly while validating overlapping ranges. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        10.
        Propagate vectored s3a input stream stats to file system stats. Sub-task Resolved Mukund Thakur

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        11.
        Improve vectored IO api spec. Sub-task Resolved Mukund Thakur   Actions
        12.
        Improve VectoredReadUtils#readVectored() for direct buffers Sub-task Resolved Mukund Thakur   Actions
        13.
        Fix VectoredIO for LocalFileSystem when checksum is enabled. Sub-task Resolved Mukund Thakur   Actions
        14.
        Vectored IO: Threadpool should be closed on interrupts or during close calls Sub-task Resolved Unassigned   Actions
        15.
        ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing Sub-task Resolved Mukund Thakur   Actions
        16.
        Add an integration test to process data asynchronously during vectored read. Sub-task Resolved Mukund Thakur   Actions
        17.
        VectorIO FileRange type to support a "reference" field Sub-task Resolved Steve Loughran   Actions
        18.
        ChecksumFileSystem::readVectored might return byte buffers not positioned at 0 Sub-task Resolved Harshit Gupta   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mthakur Mukund Thakur
            mthakur Mukund Thakur
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 33h
                33h

                Slack

                  Issue deployment