Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1089

short compressed pages caused Exception

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Patch Info:
      Patch Available

      Description

      Hi,

      tested nutch on compressed pages, and on pages with Basic Auth and compression. On short compressed pages this Exception is thrown:

      2011-08-19 17:06:55,190 ERROR httpclient.Http - java.io.IOException: unzipBestEffort returned null
      2011-08-19 17:06:55,190 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
      2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:163)
      2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
      2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
      2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)

      In same cases Basic Auth failt also.

      Works fine with the patch.

        Attachments

        1. HttpResponsePatch.patch
          0.8 kB
          simone frenzel

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                psimone simone frenzel
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: