Uploaded image for project: 'HttpComponents HttpClient'
  1. HttpComponents HttpClient
  2. HTTPCLIENT-2176

Premature end of Content-Length delimited message body but works with wget

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 4.5.13
    • None
    • HttpClient (classic)
    • None
    • httpclient: 4.5.13
      httpcore: 4.4.14

      java 11 (archaic): openjdk version "11.0.4" 2019-07-16

    Description

      I'm doing a recrawl of truncated files from CommonCrawl in support of work on Apache Tika, and I've found a few files where I'm able to download the files successfully with wget but with httpclient, I'm getting:

      org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 216,481; received: 203,820)
      
      	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
      	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:198)
      	at org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:101)
      	at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142)
      	at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
      	at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
      	at java.base/java.util.zip.InflaterInputStream.close(InflaterInputStream.java:232)
      	at java.base/java.util.zip.GZIPInputStream.close(GZIPInputStream.java:137)
      	at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94)
      	at FetcherTest.testBasic(FetcherTest.java:40)
      	
      

      The triggering file is: https://direitosculturais.com.br/pdf.php?id=151

      Example all defaults:

              String url = "https://direitosculturais.com.br/pdf.php?id=151";
              HttpClient client = HttpClientBuilder.create().build();
              HttpGet get = new HttpGet(url);
              HttpResponse r = client.execute(get);
              Path output = Paths.get("/data/tmp.pdf");
              try (InputStream is = r.getEntity().getContent()) {
                  Files.copy(is, output, StandardCopyOption.REPLACE_EXISTING);
              }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: