Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
4.5.13
-
None
-
None
-
httpclient: 4.5.13
httpcore: 4.4.14
java 11 (archaic): openjdk version "11.0.4" 2019-07-16
Description
I'm doing a recrawl of truncated files from CommonCrawl in support of work on Apache Tika, and I've found a few files where I'm able to download the files successfully with wget but with httpclient, I'm getting:
org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 216,481; received: 203,820) at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:198) at org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:101) at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142) at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228) at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172) at java.base/java.util.zip.InflaterInputStream.close(InflaterInputStream.java:232) at java.base/java.util.zip.GZIPInputStream.close(GZIPInputStream.java:137) at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94) at FetcherTest.testBasic(FetcherTest.java:40)
The triggering file is: https://direitosculturais.com.br/pdf.php?id=151
Example all defaults:
String url = "https://direitosculturais.com.br/pdf.php?id=151"; HttpClient client = HttpClientBuilder.create().build(); HttpGet get = new HttpGet(url); HttpResponse r = client.execute(get); Path output = Paths.get("/data/tmp.pdf"); try (InputStream is = r.getEntity().getContent()) { Files.copy(is, output, StandardCopyOption.REPLACE_EXISTING); }