Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2364

http.agent.rotate: IllegalArgumentException / last element of agent names ignored

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.10, 1.11, 2.3.1, 1.12
    • Fix Version/s: 2.4, 1.13
    • Component/s: protocol
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Flags:
      Patch

      Description

      With http.agent.rotate == true and a one-element agent name list, the following exception is thrown:

      % cat .../conf/agents.txt
      my-test-crawler/Nutch-1.13
      % .../bin/nutch parsechecker -Dhttp.agent.rotate=true http://nutch.apache.org/
      ...
      Fetch failed with protocol status: exception(16), lastModified=0: java.lang.IllegalArgumentException: bound must be positive
      % cat .../logs/hadoop.log
      ...
      2017-03-03 11:17:19,750 ERROR http.Http - Failed to get protocol output
      java.lang.IllegalArgumentException: bound must be positive
              at java.util.concurrent.ThreadLocalRandom.nextInt(ThreadLocalRandom.java:352)
              at org.apache.nutch.protocol.http.api.HttpBase.getUserAgent(HttpBase.java:379)
              at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:180)
      ...
      

      Caused by

      userAgentNames.get(ThreadLocalRandom.current().nextInt(userAgentNames.size()-1));
      

      but nextInt(...) is defined as: "Returns a pseudorandom int value between zero (inclusive) and the specified bound (exclusive)."

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: