Details
-
Task
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
We're about to release the first version of Crawler-Commons http://code.google.com/p/crawler-commons/ which contains a parser for robots.txt files. This parser should also be better than the one we currently have in Nutch. I will delegate this functionality to CC as soon as it is available publicly
Attachments
Attachments
Issue Links
- is duplicated by
-
NUTCH-1008 Switch to crawler-commons version of robots.txt parsing code
- Closed
- is related to
-
NUTCH-1455 RobotRulesParser to match multi-word user-agent names
- Closed