Description
bin/nutch indexchecker http://www.provinciegroningen.nl/actueel/dossiers/rwe-centrale
Fetch failed with protocol status: exception(16), lastModified=0: java.io.IOException: unzipBestEffort returned null
2013-10-01 13:44:55,612 INFO http.Http - http.proxy.host = null 2013-10-01 13:44:55,612 INFO http.Http - http.proxy.port = 8080 2013-10-01 13:44:55,612 INFO http.Http - http.timeout = 12000 2013-10-01 13:44:55,612 INFO http.Http - http.content.limit = 5242880 2013-10-01 13:44:55,612 INFO http.Http - http.agent = Mozilla/5.0 (compatible; OpenindexSpider; +http://www.openindex.io/en/webmasters/spider.html) 2013-10-01 13:44:55,612 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2013-10-01 13:44:55,613 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2013-10-01 13:44:55,925 ERROR http.Http - Failed to get protocol output java.io.IOException: unzipBestEffort returned null at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140) at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:86) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:150)
Haven't got a clue yet as to what the exact issue is.
Attachments
Attachments
Issue Links
- is related to
-
NUTCH-1736 Can't fetch page if http response header contains Transfer-Encoding:chunked
- Closed
- relates to
-
NUTCH-1089 short compressed pages caused Exception
- Closed