Description
Sites specifying a Crawl-Delay of more than 5 minutes (301 seconds or more) are always ignored, even if fetcher.max.crawl.delay is set to a higher value.
We need to pass a higher value of fetcher.max.crawl.delay to crawler-commons' robots.txt parser otherwise it will use the internal default value of 300 sec. and disallow all sites specifying a longer Crawl-Delay in their robots.txt.
Attachments
Issue Links
- links to