Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1927

Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.10
    • fetcher

    Description

      Based on discussion on the dev list, to use Nutch for some security research valid use cases (DDoS; DNS and other testing), I am going to create a patch that allows a whitelist:

      <property>
        <name>robot.rules.whitelist</name>
        <value>132.54.99.22,hostname.apache.org,foo.jpl.nasa.gov</value>
        <description>Comma separated list of hostnames or IP addresses to ignore robot rules parsing for.
        </description>
      </property>
      

      Attachments

        1. test_NUTCH-1927.2015-04-17.txt
          2 kB
          Sebastian Nagel
        2. NUTCH-1927.2015-04-17.patch
          17 kB
          Sebastian Nagel
        3. NUTCH-1927.2015-04-16.patch
          11 kB
          Sebastian Nagel
        4. NUTCH-1927.Mattmann.041415.patch.txt
          17 kB
          Chris A. Mattmann
        5. NUTCH-1927.Mattmann.041215.patch.txt
          14 kB
          Chris A. Mattmann
        6. NUTCH-1927.Mattmann.041115.patch.txt
          25 kB
          Chris A. Mattmann

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chrismattmann Chris A. Mattmann
            chrismattmann Chris A. Mattmann
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment