Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.11.0
    • Component/s: None
    • Labels:
      None

      Description

      A seek on a DFSInputStream causes causes the next read to re-open the socket connection to the datanode and fetch the remainder of the block all over again. This is not optimal.

      A small read followed by a small positive seek could re-utilize the data already fetched from the datanode as part of the previous read.

      1. smallreadseek4.patch
        3 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          If there is a forward seek within the current block, then do not re-issue the block-read request to the datanode.

          Show
          dhruba borthakur added a comment - If there is a forward seek within the current block, then do not re-issue the block-read request to the datanode.
          Hide
          Raghu Angadi added a comment -

          Did you mean to do this only for small seeks? The code does not enforce that. Even a 100MB seek will read 100MB data that should be skipped?

          Show
          Raghu Angadi added a comment - Did you mean to do this only for small seeks? The code does not enforce that. Even a 100MB seek will read 100MB data that should be skipped?
          Hide
          dhruba borthakur added a comment -

          Use skipBytes() instead of reading all the intervening data that is skipped.

          Show
          dhruba borthakur added a comment - Use skipBytes() instead of reading all the intervening data that is skipped.
          Hide
          dhruba borthakur added a comment -

          A forward skip within a previously read data block just manipulates the underlying file pointer rather than re-reding the entire block from the datanode.

          This has been reviewed by Milind. His review comments were :" Looks good to me. Can we ask the blockStream to skip diff bytes instead of reading them one by one ?"

          Show
          dhruba borthakur added a comment - A forward skip within a previously read data block just manipulates the underlying file pointer rather than re-reding the entire block from the datanode. This has been reviewed by Milind. His review comments were :" Looks good to me. Can we ask the blockStream to skip diff bytes instead of reading them one by one ?"
          Hide
          dhruba borthakur added a comment -

          In response to Raghu's comments: By "small seeks" I meant "seeks within the current block". The contents of this block was already fetched by the preceeding read call. If a datablock is 128MB then even a 100MB seek could trigger this particular optimization.

          Show
          dhruba borthakur added a comment - In response to Raghu's comments: By "small seeks" I meant "seeks within the current block". The contents of this block was already fetched by the preceeding read call. If a datablock is 128MB then even a 100MB seek could trigger this particular optimization.
          Hide
          Hadoop QA added a comment -

          +1, because http://issues.apache.org/jira/secure/attachment/12349553/smallreadseek3.patch applied and successfully tested against trunk revision r499156.

          Show
          Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12349553/smallreadseek3.patch applied and successfully tested against trunk revision r499156.
          Hide
          Doug Cutting added a comment -

          The call to skipTo() will still, in the worst case, cause an entire block to be streamed across the wire, using a lot of network bandwidth. Before we commit this I'd like to see some benchmarks showing that this is faster than closing and re-opening the connection.

          Note also that the stream buffering code already performs a similar optimization: if a 100k buffer is used, and one seeks within the buffer, then no i/o is performed on the underlying stream. So seeks of the underlying stream are generally at least a few k bytes away.

          Show
          Doug Cutting added a comment - The call to skipTo() will still, in the worst case, cause an entire block to be streamed across the wire, using a lot of network bandwidth. Before we commit this I'd like to see some benchmarks showing that this is faster than closing and re-opening the connection. Note also that the stream buffering code already performs a similar optimization: if a 100k buffer is used, and one seeks within the buffer, then no i/o is performed on the underlying stream. So seeks of the underlying stream are generally at least a few k bytes away.
          Hide
          dhruba borthakur added a comment -

          I agree with your comments. The amount of data cached by the receiving size of the TCP connection could possibly depend on the latency of transfer and the amount of memory available to the sender and received.

          By default, the TCP sending window size is usually 128KB and receiving windows size is 4MB. I propose that I change the above patch to trigger the optmization only if the skip length is <= 128KB.

          Show
          dhruba borthakur added a comment - I agree with your comments. The amount of data cached by the receiving size of the TCP connection could possibly depend on the latency of transfer and the amount of memory available to the sender and received. By default, the TCP sending window size is usually 128KB and receiving windows size is 4MB. I propose that I change the above patch to trigger the optmization only if the skip length is <= 128KB.
          Hide
          Doug Cutting added a comment -

          > I propose that I change the above patch to trigger the optmization only if the skip length is <= 128KB.

          +1

          Show
          Doug Cutting added a comment - > I propose that I change the above patch to trigger the optmization only if the skip length is <= 128KB. +1
          Hide
          dhruba borthakur added a comment -

          One change from the previously submitted patch: allow the optimization only if the skip-length is within 128KB range. (The TCP receive window size is typically more than this limit).

          Show
          dhruba borthakur added a comment - One change from the previously submitted patch: allow the optimization only if the skip-length is within 128KB range. (The TCP receive window size is typically more than this limit).
          Hide
          dhruba borthakur added a comment -

          One change from the previously submitted patch: allow the optimization only if the skip-length is within 128KB range. (The TCP receive window size is typically more than this limit).

          Show
          dhruba borthakur added a comment - One change from the previously submitted patch: allow the optimization only if the skip-length is within 128KB range. (The TCP receive window size is typically more than this limit).
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Dhruba!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Dhruba!
          Hide
          Hadoop QA added a comment -

          -1, because the patch command could not apply the latest attachment (http://issues.apache.org/jira/secure/attachment/12349962/smallreadseek4.patch) as a patch to trunk revision r501616. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

          Show
          Hadoop QA added a comment - -1, because the patch command could not apply the latest attachment ( http://issues.apache.org/jira/secure/attachment/12349962/smallreadseek4.patch ) as a patch to trunk revision r501616. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development