Details
Description
When re-fetching a document (a continued crawl) HttpClient throws an null pointer exception causing the document to be emptied:
2010-07-27 12:45:09,199 INFO fetcher.Fetcher - fetching http://localhost/doc/selfhtml/html/index.htm
2010-07-27 12:45:09,203 ERROR httpclient.Http - java.lang.NullPointerException
2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:138)
2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:220)
2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:537)
2010-07-27 12:45:09,204 INFO fetcher.Fetcher - fetch of http://localhost/doc/selfhtml/html/index.htm failed with: java.lang.NullPointerException
Because the document is re-fetched the server answers "304" (not modified):
127.0.0.1 - - [27/Jul/2010:12:45:09 +0200] "GET /doc/selfhtml/html/index.htm HTTP/1.0" 304 174 "-" "Nutch-1.0"
No content is sent in this case (empty http body).
Index: trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java
===================================================================
— trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java (revision 979647)
+++ trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java (working copy)
@@ -134,7 +134,8 @@
if (code == 200) throw new IOException(e.toString());
// for codes other than 200 OK, we are fine with empty content
} finally