Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-277

Fetcher dies because of "max. redirects" (avoiding infinite loop)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Cannot Reproduce
    • 0.8
    • 0.9.0
    • fetcher
    • None
    • nightly-2006-05-20

    Description

      Error in the logs is:
      060521 213401 SEVERE Narrowly avoided an infinite loop in execute
      org.apache.commons.httpclient.RedirectException: Maximum redirects (100) exceeded
      at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:183)
      at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
      at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
      at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:87)
      at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:97)
      at org.apache.nutch.protocol.http.api.RobotRulesParser.isAllowed(RobotRulesParser.java:394)
      at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:173)
      at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)

      This happens during normal crawling. Unfortunately I don't know how to further track this down. But it's problematic, since it actually makes the fetcher die.

      Workaround (for the symptom) is in NUTCH-258 (avoid dying on SEVERE logentry). That works for me, crawling works fine and it does not hang/crash. However this is working around the problems not solving them - I know. But it helps for the moment ...

      Hope somebody can help - this loops quite important to track down to me.

      Attachments

        Issue Links

          Activity

            People

              ab Andrzej Bialecki
              neufeind Stefan Neufeind
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: