Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2386

BasicURLNormalizer does not encode curly braces

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14
    • None
    • None
    • Patch Available

    Description

      Causing:

      2017-05-15 13:23:33,474 ERROR [FetcherThread] org.apache.nutch.protocol.httpclient.Http: Failed to get protocol output
      java.lang.IllegalArgumentException: Invalid uri 'https://www.example.org/32/{{relative_url}}': escaped absolute path not valid
      	at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
      	at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
      	at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:76)
      	at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:181)
      	at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:261)
      	at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:295)
      

      Attachments

        1. NUTCH-2386.patch
          2 kB
          Markus Jelsma

        Activity

          People

            markus17 Markus Jelsma
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: