Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18028 High performance S3A input stream with prefetching & caching
  3. HADOOP-18852

S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look like random IO

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.6
    • None
    • fs/s3
    • None

    Description

      noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with separately.

      1. all seeks are lazy; no fetching is kicked off after an open
      2. the first read is treated as an out of order read, so cancels any active reads (don't think there are any) and then only asks for 1 block
          if (outOfOrderRead) {
            LOG.debug("lazy-seek({})", getOffsetStr(readPos));
            blockManager.cancelPrefetches();
      
            // We prefetch only 1 block immediately after a seek operation.
            prefetchCount = 1;
          }
      
      
      • for any read fully we should prefetch all blocks in the range requested
      • for other reads, we may want a bigger prefech count than 1, depending on: split start/end, file read policy (random, sequential, whole-file)
      • also, if a read is in a block other than the current one, but which is already being fetched or cached, is this really an OOO read to the extent that outstanding fetches should be cancelled?

      Attachments

        Activity

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: