Uploaded image for project: 'Commons VFS'
  1. Commons VFS
  2. VFS-805

HTTP seek always exhausts response

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 2.9.0
    • None

    Description

      Seeking on an HTTP resource always downloads ALL content if a Content-Length header is present. The problem is that seeking closes the current input stream which eventually ends up in ContentLengthInputStream.close() of the (ancient) http client library.

       

      To be clear, the problem is actually not with the seek itself, but with the underlying close implementation that always exhausts the HTTP response body. See the example below.

       

      My use case is to perform binary search on sorted datasets on the Web (RDF data in sorted ntriple syntax) - the binary search works locally and in principle works on HTTP resources abstracted with VFS2, but the seek implementation that downloads ALL data (in my case several GBs) unfortunately defeats the purpose

       

      From org.apache.commons.httpclient.ContentLengthInputStream (commons-httpclient-3.1):

          public void close() throws IOException {
              if (!closed) {
                  try {
                      ChunkedInputStream.exhaustInputStream(this);
                  } finally {
                      // close after above so that we don't throw an exception trying
                      // to read after closed!
                      closed = true;
                  }
              }
          }
      

      Example:

      	public static void main(String[] args) throws Exception {
      		String url = "http://localhost/large-file-2gb.txt";
      		FileSystemManager fsManager = VFS.getManager();
      		
      		try (FileObject file = fsManager.resolveFile(url)) {	
      			try (RandomAccessContent r = file.getContent().getRandomAccessContent(RandomAccessMode.READ)) {
      				
      				StopWatch sw1 = StopWatch.createStarted();
      				r.seek(20);
      				System.out.println("Initial seek: " + sw1.getTime(TimeUnit.MILLISECONDS));
      
      				StopWatch sw2 = StopWatch.createStarted();
      				byte[] bytes = new byte[100];
      				r.readFully(bytes);
      				System.out.println("Read: " + sw2.getTime(TimeUnit.MILLISECONDS));
      				
      				StopWatch sw3 = StopWatch.createStarted();
      				r.seek(100);
      				System.out.println("Subsequent seek: " + sw3.getTime(TimeUnit.MILLISECONDS));
      			}
      		}
      		System.out.println("Done");
      	}
      

      Output (times in milliseconds):

      Initial seek: 0
      Read: 4
      Subsequent seek: 2538
      Done
      

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Aklakan Claus Stadler
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: