Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2386

BasicURLNormalizer does not encode curly braces

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.13
    • Fix Version/s: 1.14
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Causing:

      2017-05-15 13:23:33,474 ERROR [FetcherThread] org.apache.nutch.protocol.httpclient.Http: Failed to get protocol output
      java.lang.IllegalArgumentException: Invalid uri 'https://www.example.org/32/{{relative_url}}': escaped absolute path not valid
      	at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
      	at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
      	at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:76)
      	at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:181)
      	at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:261)
      	at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:295)
      

        Attachments

          Activity

            People

            • Assignee:
              markus17 Markus Jelsma
              Reporter:
              markus17 Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: