Description
FetcherThread has support for db.ignore.external.links. In config you can find db.ignore.internal.links as well, but it only operates on LinkDB, which is confusing. This patch will introduce db.ignore.internal.links to FetcherThread, similar to db.ignore.external.links. With both parameter set to true you can limit the crawl to the injected seed list.
Attachments
Attachments
Issue Links
- breaks
-
NUTCH-2144 Plugin to override db.ignore.external to exempt interesting external domain URLs
- Closed
- depends upon
-
NUTCH-2220 Rename db.* options used only by the linkdb to linkdb.*
- Closed
- is related to
-
NUTCH-2216 db.ignore.*.links to optionally follow internal redirects
- Closed