Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2364

http.agent.rotate: IllegalArgumentException / last element of agent names ignored

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.10, 1.11, 2.3.1, 1.12
    • 2.4, 1.13
    • protocol
    • None
    • Patch Available
    • Patch

    Description

      With http.agent.rotate == true and a one-element agent name list, the following exception is thrown:

      % cat .../conf/agents.txt
      my-test-crawler/Nutch-1.13
      % .../bin/nutch parsechecker -Dhttp.agent.rotate=true http://nutch.apache.org/
      ...
      Fetch failed with protocol status: exception(16), lastModified=0: java.lang.IllegalArgumentException: bound must be positive
      % cat .../logs/hadoop.log
      ...
      2017-03-03 11:17:19,750 ERROR http.Http - Failed to get protocol output
      java.lang.IllegalArgumentException: bound must be positive
              at java.util.concurrent.ThreadLocalRandom.nextInt(ThreadLocalRandom.java:352)
              at org.apache.nutch.protocol.http.api.HttpBase.getUserAgent(HttpBase.java:379)
              at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:180)
      ...
      

      Caused by

      userAgentNames.get(ThreadLocalRandom.current().nextInt(userAgentNames.size()-1));
      

      but nextInt(...) is defined as: "Returns a pseudorandom int value between zero (inclusive) and the specified bound (exclusive)."

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: