Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-815

Invalid blank line before If-Modified-Since HTTP header

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1
    • Component/s: None
    • Labels:
      None
    • Environment:

      Nutch 1.0.0, Windows XP, Java 1.6.0_17

      Description

      If there is a Modified time stored in the crawldb for a link, the class org.apache.nutch.protocol.http.HttpResponse will use it as the value for the If-Modified-Since header.

      Line 131:
      reqStr.append("\r\n");
      if (datum.getModifiedTime() > 0) {
      reqStr.append("If-Modified-Since: " + HttpDateFormat.toString(datum.getModifiedTime()));
      reqStr.append("\r\n");
      }

      The problem is that an extra blank line is insert before this header. This make the header invalid:
      ----------------------------------------------------------------------------------
      GET /tinysite/second.html HTTP/1.0
      Host: localhost:8080
      Accept-Encoding: x-gzip, gzip, deflate
      User-Agent: nutch/Nutch-1.0
      Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3

      If-Modified-Since: Tue, 27 Apr 2010 13:51:50 GMT
      ----------------------------------------------------------------------------------

      I'm using the AdaptiveFetchSchedule to set the Modified time in the crawldb.

      I've made a test by moving the line 131 after the if block and it works. I think this is where that line should go.

        Attachments

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              pascaldimassimo Pascal Dimassimo
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: