Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-862

HttpClient null pointer exception

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.0.0
    • 1.2, nutchgora
    • fetcher
    • None
    • linux, java 6

    • Patch Available

    Description

      When re-fetching a document (a continued crawl) HttpClient throws an null pointer exception causing the document to be emptied:

      2010-07-27 12:45:09,199 INFO fetcher.Fetcher - fetching http://localhost/doc/selfhtml/html/index.htm
      2010-07-27 12:45:09,203 ERROR httpclient.Http - java.lang.NullPointerException
      2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:138)
      2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
      2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:220)
      2010-07-27 12:45:09,204 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:537)
      2010-07-27 12:45:09,204 INFO fetcher.Fetcher - fetch of http://localhost/doc/selfhtml/html/index.htm failed with: java.lang.NullPointerException

      Because the document is re-fetched the server answers "304" (not modified):

      127.0.0.1 - - [27/Jul/2010:12:45:09 +0200] "GET /doc/selfhtml/html/index.htm HTTP/1.0" 304 174 "-" "Nutch-1.0"

      No content is sent in this case (empty http body).

      Index: trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java
      ===================================================================
      — trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java (revision 979647)
      +++ trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java (working copy)
      @@ -134,7 +134,8 @@
      if (code == 200) throw new IOException(e.toString());
      // for codes other than 200 OK, we are fine with empty content
      } finally

      { - in.close(); + if (in != null) + in.close(); get.abort(); }

      Attachments

        1. NUTCH-862.patch
          0.7 kB
          Sebastian Nagel

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ab Andrzej Bialecki
            snagel Sebastian Nagel
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment