Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-247

robot parser to restrict.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.8
    • 1.0.0
    • fetcher
    • None
    • Patch Available

    Description

      If the agent name and the robots agents are not proper configure the Robot rule parser uses LOG.severe to log the problem but solve it also.
      Later on the fetcher thread checks for severe errors and stop if there is one.

      RobotRulesParser:

      if (agents.size() == 0)

      { agents.add(agentName); LOG.severe("No agents listed in 'http.robots.agents' property!"); }

      else if (!((String)agents.get(0)).equalsIgnoreCase(agentName))

      { agents.add(0, agentName); LOG.severe("Agent we advertise (" + agentName + ") not listed first in 'http.robots.agents' property!"); }

      Fetcher.FetcherThread:
      if (LogFormatter.hasLoggedSevere()) // something bad happened
      break;

      I suggest to use warn or something similar instead of severe to log this problem.

      Attachments

        1. agent-names3.patch.txt
          11 kB
          Dennis Kubes
        2. agent-names.patch
          9 kB
          Dennis Kubes

        Issue Links

          Activity

            People

              siren Sami Siren
              joa23 Stefan Groschupf
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: