Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2549 protocol-http does not behave the same as browsers
  3. NUTCH-2555

URL normalization problem: path not starting with a '/'

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.14
    • 1.15
    • None
    • None

    Description

      When an URL does not have a path but has GET parameters (for instance 'http://example.com?a=1') it should be normalized to add a '/' at the beginning of the path (giving http://example.com/?a=1). Our logs show that non-normalized URLs reach protocol-http, which then uses URL::getFile() to get the path, and tries to send an invalid HTTP request:

      GET ?a=1 HTTP/1.0

      instead of

      GET /?a=1 HTTP/1.0

       

      Example URL for which this poses a problem: http://news.fx678.com?171

      Attachments

        Activity

          People

            Unassigned Unassigned
            gbouchar Gerard Bouchar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: