Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.8.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Even with lazy seek, tests can show that sometimes a short-distance forward seek is triggering a close + reopen, because the threshold for the seek is simply available bytes in the inner stream.

      A configurable threshold would allow data to be read and discarded before that seek. This should be beneficial over long-haul networks as the time to set up the TCP channel is high, and TCP-slow-start means that the ramp up of bandwidth is slow. In such deployments, it will better to read forward than re-open, though the exact "best" number will vary with client and endpoint.

      1. HADOOP-13047.WIP.2.patch
        31 kB
        Rajesh Balamohan
      2. HADOOP-13047.WIP.patch
        8 kB
        Rajesh Balamohan

        Issue Links

          Activity

          Hide
          stevel@apache.org Steve Loughran added a comment -

          Output of a test run from my laptop (E over Power to base station, BT FTTC 80Mbit down/15 up), connected to amazon US.

          This is with lazy seek; the log is showing initial getPos() value, then final getPos(); actual will
          Looks like 1s is time to open a connection

          
          2016-04-21 20:26:32,979 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of stat = 187,328,000 ns
          2016-04-21 20:26:33,149 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of open = 169,326,000 ns
          2016-04-21 20:26:33,149 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:33,496 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 0] = 345,822,000 ns
          2016-04-21 20:26:33,496 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=1 contentLength=20299927}
          2016-04-21 20:26:33,496 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:33,497 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 1] = 17,000 ns
          2016-04-21 20:26:33,497 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20299927}
          2016-04-21 20:26:33,497 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:33,497 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 256] = 17,000 ns
          2016-04-21 20:26:33,498 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20299927}
          2016-04-21 20:26:33,499 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:33,499 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(EOF-2) [pos = 256] = 12,000 ns
          2016-04-21 20:26:33,499 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=20299925 contentLength=20299927}
          2016-04-21 20:26:33,499 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:34,423 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20299925] = 922,650,000 ns
          2016-04-21 20:26:34,423 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20299926 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:34,423 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:34,758 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte[1]) [pos = 20299926] = 333,713,000 ns
          2016-04-21 20:26:34,758 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=2 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:34,758 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          // series of forward looking reads
          2016-04-21 20:26:35,767 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte[256]) [pos = 20299926] = 1,008,214,000 ns
          2016-04-21 20:26:35,767 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=257 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:35,767 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          next read is in available(), so cost is 61 microseconds
          2016-04-21 20:26:35,768 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(260, byte[256]) [pos = 20299926] = 61,000 ns
          2016-04-21 20:26:35,768 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=516 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:35,768 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          As is this one, just under 512 bytes ahead
          2016-04-21 20:26:35,768 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1024, byte[256]) [pos = 20299926] = 23,000 ns
          2016-04-21 20:26:35,769 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1280 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:35,769 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:35,769 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1536, byte[256]) [pos = 20299926] = 28,000 ns
          2016-04-21 20:26:35,769 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1792 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:35,769 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          // going forward to 8192 bytes triggers a full close and read
          2016-04-21 20:26:38,634 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(8192, byte[1024]) [pos = 20299926] = 2,863,740,000 ns
          2016-04-21 20:26:38,634 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=9216 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:38,634 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          // as now does a read ony 512 bytes ahead
          2016-04-21 20:26:39,777 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(9728, byte[1024]) [pos = 20299926] = 1,142,347,000 ns
          2016-04-21 20:26:39,778 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:39,778 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:39,778 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(getPos) [pos = 20299926] = 44,000 ns
          2016-04-21 20:26:39,779 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20299926 contentLength=20299927}
          2016-04-21 20:26:39,779 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-21 20:26:40,660 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20299926] = 880,644,000 ns
          2016-04-21 20:26:40,660 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20299927 nextReadPos=20299927 contentLength=20299927}
          2016-04-21 20:26:40,661 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of close() = 66,000 ns
          2016-04-21 20:26:40,662 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          

          The notable ones are the readFully(8192, byte[1024]) followed by the next read only 512 bytes ahead of the pointer. That was further ahead than had collected locally, which triggered a full reconnect and a delay of over a second as a result.

          Show
          stevel@apache.org Steve Loughran added a comment - Output of a test run from my laptop (E over Power to base station, BT FTTC 80Mbit down/15 up), connected to amazon US. This is with lazy seek; the log is showing initial getPos() value, then final getPos(); actual will Looks like 1s is time to open a connection 2016-04-21 20:26:32,979 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of stat = 187,328,000 ns 2016-04-21 20:26:33,149 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of open = 169,326,000 ns 2016-04-21 20:26:33,149 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:33,496 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 0] = 345,822,000 ns 2016-04-21 20:26:33,496 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=1 contentLength=20299927} 2016-04-21 20:26:33,496 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:33,497 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 1] = 17,000 ns 2016-04-21 20:26:33,497 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20299927} 2016-04-21 20:26:33,497 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:33,497 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 256] = 17,000 ns 2016-04-21 20:26:33,498 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20299927} 2016-04-21 20:26:33,499 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:33,499 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(EOF-2) [pos = 256] = 12,000 ns 2016-04-21 20:26:33,499 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=20299925 contentLength=20299927} 2016-04-21 20:26:33,499 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:34,423 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20299925] = 922,650,000 ns 2016-04-21 20:26:34,423 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=20299926 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:34,423 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:34,758 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte [1]) [pos = 20299926] = 333,713,000 ns 2016-04-21 20:26:34,758 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=2 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:34,758 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - // series of forward looking reads 2016-04-21 20:26:35,767 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte [256]) [pos = 20299926] = 1,008,214,000 ns 2016-04-21 20:26:35,767 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=257 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:35,767 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - next read is in available(), so cost is 61 microseconds 2016-04-21 20:26:35,768 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(260, byte [256]) [pos = 20299926] = 61,000 ns 2016-04-21 20:26:35,768 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=516 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:35,768 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - As is this one, just under 512 bytes ahead 2016-04-21 20:26:35,768 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1024, byte [256]) [pos = 20299926] = 23,000 ns 2016-04-21 20:26:35,769 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1280 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:35,769 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:35,769 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1536, byte [256]) [pos = 20299926] = 28,000 ns 2016-04-21 20:26:35,769 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1792 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:35,769 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - // going forward to 8192 bytes triggers a full close and read 2016-04-21 20:26:38,634 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(8192, byte [1024]) [pos = 20299926] = 2,863,740,000 ns 2016-04-21 20:26:38,634 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=9216 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:38,634 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - // as now does a read ony 512 bytes ahead 2016-04-21 20:26:39,777 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(9728, byte [1024]) [pos = 20299926] = 1,142,347,000 ns 2016-04-21 20:26:39,778 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=10752 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:39,778 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:39,778 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(getPos) [pos = 20299926] = 44,000 ns 2016-04-21 20:26:39,779 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=10752 nextReadPos=20299926 contentLength=20299927} 2016-04-21 20:26:39,779 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-21 20:26:40,660 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20299926] = 880,644,000 ns 2016-04-21 20:26:40,660 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=20299927 nextReadPos=20299927 contentLength=20299927} 2016-04-21 20:26:40,661 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of close() = 66,000 ns 2016-04-21 20:26:40,662 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - The notable ones are the readFully(8192, byte [1024] ) followed by the next read only 512 bytes ahead of the pointer. That was further ahead than had collected locally, which triggered a full reconnect and a delay of over a second as a result.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I propose

          1. having a configurable default value
          2. implementing setReadAhead(long) to allow code to dynamically tune this value. It does't quite control the pre-load, but it is tuning how far ahead the stream can read before triggering an expensive seek.
          Show
          stevel@apache.org Steve Loughran added a comment - I propose having a configurable default value implementing setReadAhead(long) to allow code to dynamically tune this value. It does't quite control the pre-load, but it is tuning how far ahead the stream can read before triggering an expensive seek.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          + add all bytes skipped to the statistics of bytes read. Currently they aren't

          Show
          stevel@apache.org Steve Loughran added a comment - + add all bytes skipped to the statistics of bytes read. Currently they aren't
          Hide
          rajesh.balamohan Rajesh Balamohan added a comment -

          Attaching the high level WIP patch. Based on the gathered statistics on the amount of data read so far and the time taken to connect, it should be possible to determine whether to establish a new connection or to read from existing stream itself (like the case you had pointed earlier). WIP tries to address this scenario. It might not be possible to use something like ReadAheadPool in hadoop directly as that is based on FileDescriptor.

          Show
          rajesh.balamohan Rajesh Balamohan added a comment - Attaching the high level WIP patch. Based on the gathered statistics on the amount of data read so far and the time taken to connect, it should be possible to determine whether to establish a new connection or to read from existing stream itself (like the case you had pointed earlier). WIP tries to address this scenario. It might not be possible to use something like ReadAheadPool in hadoop directly as that is based on FileDescriptor.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          thanks. I'm doing the low-level metrics patch right now, even though the two patches address different things, they wont merge cleanly. Just a warning...

          If you look at SwiftNativeInputStream, it reads ahead an arbitrary number of bytes; we could do something like that

          Show
          stevel@apache.org Steve Loughran added a comment - thanks. I'm doing the low-level metrics patch right now, even though the two patches address different things, they wont merge cleanly. Just a warning... If you look at SwiftNativeInputStream, it reads ahead an arbitrary number of bytes; we could do something like that
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Rajesh, here's the output of my instrumented test run, showing the stats collected. These statistics will be accessible at the stream level, so can be used to test whether the extended readahead has actually worked. I also add the bytes skipped to the count of bytes read, so you can implicitly work out if a stream skipped, though it's a lot trickier than just getting the stats on bytesSkippedOnSeek

          ===== TEST OUTPUT FOR o.a.s.cloud.s3.S3aIOSuite: 'SeekReadFully: Cost of seek and ReadFully' =====
          
          2016-04-22 14:46:49,509 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of stat = 220,789,000 ns
          2016-04-22 14:46:49,704 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of open = 195,051,000 ns
          2016-04-22 14:46:49,705 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:50,136 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 0] = 430,721,000 ns
          2016-04-22 14:46:50,137 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=1 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:50,137 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:50,137 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 1] = 22,000 ns
          2016-04-22 14:46:50,138 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:50,138 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:50,138 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 256] = 17,000 ns
          2016-04-22 14:46:50,139 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:50,140 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:50,141 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(EOF-2) [pos = 256] = 25,000 ns
          2016-04-22 14:46:50,141 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=20314848 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:50,142 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:51,069 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20314848] = 927,211,000 ns
          2016-04-22 14:46:51,070 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20314849 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=1, streamOpened=2, streamCloseOperations=1, backwardSeekOperations=0, streamSeekOperations=1, streamClosed=0, readExceptions=0, forwardSeekOperations=1, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:51,070 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:51,460 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte[1]) [pos = 20314849] = 389,682,000 ns
          2016-04-22 14:46:51,461 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=2 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=1, streamOpened=3, streamCloseOperations=2, backwardSeekOperations=1, streamSeekOperations=2, streamClosed=1, readExceptions=0, forwardSeekOperations=1, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:51,461 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:52,583 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte[256]) [pos = 20314849] = 1,121,839,000 ns
          2016-04-22 14:46:52,583 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=257 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=3, streamClosed=1, readExceptions=0, forwardSeekOperations=1, bytesSkippedOnSeek=0}}
          2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(260, byte[256]) [pos = 20314849] = 56,000 ns
          2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=516 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=4, streamClosed=1, readExceptions=0, forwardSeekOperations=2, bytesSkippedOnSeek=3}}
          2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1024, byte[256]) [pos = 20314849] = 27,000 ns
          2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1280 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=5, streamClosed=1, readExceptions=0, forwardSeekOperations=3, bytesSkippedOnSeek=511}}
          2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1536, byte[256]) [pos = 20314849] = 25,000 ns
          2016-04-22 14:46:52,586 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1792 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=6, streamClosed=1, readExceptions=0, forwardSeekOperations=4, bytesSkippedOnSeek=767}}
          2016-04-22 14:46:52,586 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:53,773 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(8192, byte[1024]) [pos = 20314849] = 1,186,885,000 ns
          2016-04-22 14:46:53,774 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=9216 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=3, streamOpened=5, streamCloseOperations=4, backwardSeekOperations=2, streamSeekOperations=7, streamClosed=1, readExceptions=0, forwardSeekOperations=5, bytesSkippedOnSeek=767}}
          2016-04-22 14:46:53,774 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:54,891 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(9728, byte[1024]) [pos = 20314849] = 1,116,103,000 ns
          2016-04-22 14:46:54,891 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=4, streamOpened=6, streamCloseOperations=5, backwardSeekOperations=2, streamSeekOperations=8, streamClosed=1, readExceptions=0, forwardSeekOperations=6, bytesSkippedOnSeek=767}}
          2016-04-22 14:46:54,891 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:54,892 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(getPos) [pos = 20314849] = 18,000 ns
          2016-04-22 14:46:54,892 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=4, streamOpened=6, streamCloseOperations=5, backwardSeekOperations=2, streamSeekOperations=8, streamClosed=1, readExceptions=0, forwardSeekOperations=6, bytesSkippedOnSeek=767}}
          2016-04-22 14:46:54,893 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          2016-04-22 14:46:55,854 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20314849] = 960,889,000 ns
          2016-04-22 14:46:55,854 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20314850 nextReadPos=20314850 contentLength=20314850 statistics{streamAborted=5, streamOpened=7, streamCloseOperations=6, backwardSeekOperations=2, streamSeekOperations=9, streamClosed=1, readExceptions=0, forwardSeekOperations=7, bytesSkippedOnSeek=767}}
          2016-04-22 14:46:55,855 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of close() = 70,000 ns
          2016-04-22 14:46:55,862 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - Statistics 3843 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops
          2016-04-22 14:46:55,863 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
          
          
          Show
          stevel@apache.org Steve Loughran added a comment - Rajesh, here's the output of my instrumented test run, showing the stats collected. These statistics will be accessible at the stream level, so can be used to test whether the extended readahead has actually worked. I also add the bytes skipped to the count of bytes read, so you can implicitly work out if a stream skipped, though it's a lot trickier than just getting the stats on bytesSkippedOnSeek ===== TEST OUTPUT FOR o.a.s.cloud.s3.S3aIOSuite: 'SeekReadFully: Cost of seek and ReadFully' ===== 2016-04-22 14:46:49,509 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of stat = 220,789,000 ns 2016-04-22 14:46:49,704 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of open = 195,051,000 ns 2016-04-22 14:46:49,705 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:50,136 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 0] = 430,721,000 ns 2016-04-22 14:46:50,137 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=1 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}} 2016-04-22 14:46:50,137 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:50,137 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 1] = 22,000 ns 2016-04-22 14:46:50,138 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}} 2016-04-22 14:46:50,138 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:50,138 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(256) [pos = 256] = 17,000 ns 2016-04-22 14:46:50,139 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=256 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}} 2016-04-22 14:46:50,140 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:50,141 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(EOF-2) [pos = 256] = 25,000 ns 2016-04-22 14:46:50,141 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1 nextReadPos=20314848 contentLength=20314850 statistics{streamAborted=0, streamOpened=1, streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, streamClosed=0, readExceptions=0, forwardSeekOperations=0, bytesSkippedOnSeek=0}} 2016-04-22 14:46:50,142 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:51,069 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20314848] = 927,211,000 ns 2016-04-22 14:46:51,070 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=20314849 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=1, streamOpened=2, streamCloseOperations=1, backwardSeekOperations=0, streamSeekOperations=1, streamClosed=0, readExceptions=0, forwardSeekOperations=1, bytesSkippedOnSeek=0}} 2016-04-22 14:46:51,070 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:51,460 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte [1]) [pos = 20314849] = 389,682,000 ns 2016-04-22 14:46:51,461 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=2 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=1, streamOpened=3, streamCloseOperations=2, backwardSeekOperations=1, streamSeekOperations=2, streamClosed=1, readExceptions=0, forwardSeekOperations=1, bytesSkippedOnSeek=0}} 2016-04-22 14:46:51,461 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:52,583 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1, byte [256]) [pos = 20314849] = 1,121,839,000 ns 2016-04-22 14:46:52,583 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=257 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=3, streamClosed=1, readExceptions=0, forwardSeekOperations=1, bytesSkippedOnSeek=0}} 2016-04-22 14:46:52,584 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:52,584 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(260, byte [256]) [pos = 20314849] = 56,000 ns 2016-04-22 14:46:52,584 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=516 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=4, streamClosed=1, readExceptions=0, forwardSeekOperations=2, bytesSkippedOnSeek=3}} 2016-04-22 14:46:52,584 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:52,585 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1024, byte [256]) [pos = 20314849] = 27,000 ns 2016-04-22 14:46:52,585 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1280 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=5, streamClosed=1, readExceptions=0, forwardSeekOperations=3, bytesSkippedOnSeek=511}} 2016-04-22 14:46:52,585 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:52,585 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(1536, byte [256]) [pos = 20314849] = 25,000 ns 2016-04-22 14:46:52,586 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=1792 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=2, streamOpened=4, streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=6, streamClosed=1, readExceptions=0, forwardSeekOperations=4, bytesSkippedOnSeek=767}} 2016-04-22 14:46:52,586 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:53,773 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(8192, byte [1024]) [pos = 20314849] = 1,186,885,000 ns 2016-04-22 14:46:53,774 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=9216 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=3, streamOpened=5, streamCloseOperations=4, backwardSeekOperations=2, streamSeekOperations=7, streamClosed=1, readExceptions=0, forwardSeekOperations=5, bytesSkippedOnSeek=767}} 2016-04-22 14:46:53,774 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:54,891 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of readFully(9728, byte [1024]) [pos = 20314849] = 1,116,103,000 ns 2016-04-22 14:46:54,891 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=10752 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=4, streamOpened=6, streamCloseOperations=5, backwardSeekOperations=2, streamSeekOperations=8, streamClosed=1, readExceptions=0, forwardSeekOperations=6, bytesSkippedOnSeek=767}} 2016-04-22 14:46:54,891 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:54,892 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of seek(getPos) [pos = 20314849] = 18,000 ns 2016-04-22 14:46:54,892 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=10752 nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=4, streamOpened=6, streamCloseOperations=5, backwardSeekOperations=2, streamSeekOperations=8, streamClosed=1, readExceptions=0, forwardSeekOperations=6, bytesSkippedOnSeek=767}} 2016-04-22 14:46:54,893 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - 2016-04-22 14:46:55,854 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of read() [pos = 20314849] = 960,889,000 ns 2016-04-22 14:46:55,854 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - S3AInputStream{s3a: //landsat-pds/scene_list.gz pos=20314850 nextReadPos=20314850 contentLength=20314850 statistics{streamAborted=5, streamOpened=7, streamCloseOperations=6, backwardSeekOperations=2, streamSeekOperations=9, streamClosed=1, readExceptions=0, forwardSeekOperations=7, bytesSkippedOnSeek=767}} 2016-04-22 14:46:55,855 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Duration of close() = 70,000 ns 2016-04-22 14:46:55,862 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) - Statistics 3843 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops 2016-04-22 14:46:55,863 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
          Hide
          stevel@apache.org Steve Loughran added a comment -

          actually, your metrics setup is better than mine...let me see how I can lift them

          Show
          stevel@apache.org Steve Loughran added a comment - actually, your metrics setup is better than mine...let me see how I can lift them
          Hide
          stevel@apache.org Steve Loughran added a comment -

          regarding the patch, I don't think we need to go to anything trying to be adaptive to bandwidth., at least not initially. Having something you can preconfigure should be enough at first.

          Why? For short lived streams, you aren't going to have any statistics....yet you may know across applications and instances of the app whether you are near/far from s3, so can choose some values and see what works

          Show
          stevel@apache.org Steve Loughran added a comment - regarding the patch, I don't think we need to go to anything trying to be adaptive to bandwidth., at least not initially. Having something you can preconfigure should be enough at first. Why? For short lived streams, you aren't going to have any statistics....yet you may know across applications and instances of the app whether you are near/far from s3, so can choose some values and see what works
          Hide
          rajesh.balamohan Rajesh Balamohan added a comment -

          Agreed. Created "fs.s3a.readahead.buffer.size" which can be used for configuring the buffer size (initially thought of reusing io.file.buffer.size in S3AFileSystem, but it would be better to track s3a readahead separately).

          Created the patch which includes HADOOP-13028. Main changes are in seekInputStream. Patch should be a lot simplified once HADOOP-13028 is checked in.

          Show
          rajesh.balamohan Rajesh Balamohan added a comment - Agreed. Created "fs.s3a.readahead.buffer.size" which can be used for configuring the buffer size (initially thought of reusing io.file.buffer.size in S3AFileSystem, but it would be better to track s3a readahead separately). Created the patch which includes HADOOP-13028 . Main changes are in seekInputStream. Patch should be a lot simplified once HADOOP-13028 is checked in.
          Hide
          stevel@apache.org Steve Loughran added a comment -
          1. I like the simplicity of this being a buffer.
          2. There's no need to actually have a separate read+discard loop to go past the available bytes; .skip() will do that, simply blocking until the data is ready. (Though a sanity check at the end is probably wise)
          3. I'd like that buffer size to be settable via setReadahead(). Why? It would allow applications which really knew what they were doing to actually set readahead for their algorithms. Though I don't see any evidence of that method being used anywhere in the Hadoop codebase —does anyone have any data on how often code uses it?
          4. Patch -002 of HADOOP-13028 exposes the statistics of a stream via a visible for testing method...this would all you to write a test which verified that a forward seek within the buffer range worked. This is where that setReadahead() call would really be useful, as you could change the buffer size and verify that forward seeks simply increment the counter of bytes skipped on seek; the number of stream close/open events would be unchanged.
          Show
          stevel@apache.org Steve Loughran added a comment - I like the simplicity of this being a buffer. There's no need to actually have a separate read+discard loop to go past the available bytes; .skip() will do that, simply blocking until the data is ready. (Though a sanity check at the end is probably wise) I'd like that buffer size to be settable via setReadahead(). Why? It would allow applications which really knew what they were doing to actually set readahead for their algorithms. Though I don't see any evidence of that method being used anywhere in the Hadoop codebase —does anyone have any data on how often code uses it? Patch -002 of HADOOP-13028 exposes the statistics of a stream via a visible for testing method...this would all you to write a test which verified that a forward seek within the buffer range worked. This is where that setReadahead() call would really be useful, as you could change the buffer size and verify that forward seeks simply increment the counter of bytes skipped on seek; the number of stream close/open events would be unchanged.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          HADOOP-13028 has an attachment which compares read bandwidths to open time...from the numbers locally to the US, I'm seeing a forward skip of < 800K would be faster than closing and reopening the connection —and that's ignoring the effects of TCP slow start on followon reads

          2016-04-25 17:01:09,584 INFO  scale.S3AScaleTestBase (S3AScaleTestBase.java:end(181)) - Duration of Open stream: 164,965,000 nS
          2016-04-25 17:01:13,613 INFO  scale.S3AScaleTestBase (S3AScaleTestBase.java:end(181)) - Duration of Time to read 20405780 bytes: 4,029,203,000 nS
          2016-04-25 17:01:13,614 INFO  scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:bandwidth(135)) - Bandwidth = 5.064471  MB/S
          2016-04-25 17:01:13,614 INFO  scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:testTimeToOpenAndReadWholeFileByByte(97)) - An open() call has the equivalent duration of reading 837,385 bytes
          
          
          Show
          stevel@apache.org Steve Loughran added a comment - HADOOP-13028 has an attachment which compares read bandwidths to open time...from the numbers locally to the US, I'm seeing a forward skip of < 800K would be faster than closing and reopening the connection —and that's ignoring the effects of TCP slow start on followon reads 2016-04-25 17:01:09,584 INFO scale.S3AScaleTestBase (S3AScaleTestBase.java:end(181)) - Duration of Open stream: 164,965,000 nS 2016-04-25 17:01:13,613 INFO scale.S3AScaleTestBase (S3AScaleTestBase.java:end(181)) - Duration of Time to read 20405780 bytes: 4,029,203,000 nS 2016-04-25 17:01:13,614 INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:bandwidth(135)) - Bandwidth = 5.064471 MB/S 2016-04-25 17:01:13,614 INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:testTimeToOpenAndReadWholeFileByByte(97)) - An open() call has the equivalent duration of reading 837,385 bytes

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development