Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2549 protocol-http does not behave the same as browsers
  3. NUTCH-2564

protocol-http throws an error when the content-length header is not a number

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      When a server sends an invalid Content-Length header (one that is not a valid number) with a plain-text http body, browsers simply ignore it, but protocol-http has a strange approach: if the header is composed only of white spaces, it ignores it, but if it contains other characters, it throws an error, preventing us from doing anything with the page.

      It should simply ignore invalid Content-Length headers.

       

      Relevant code: https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            gbouchar Gerard Bouchar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: