Details
-
Task
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
None
-
None
Description
We're about to release the first version of Crawler-Commons http://code.google.com/p/crawler-commons/ which contains a parser for robots.txt files. This parser should also be better than the one we currently have in Nutch. I will delegate this functionality to CC as soon as it is available publicly