Description
db.ignore.internal.links doesn't follow any internal hyperlinks or redirects. Together with db.ignore.external.links it helps to restrict the crawl to a predefined set of URL's, for example provided by a customer.
In many cases, a few of those URL's are redirects, which are not followed. This issue adds an option to optionally allow internal redirects despite db.ignore.internal.links being enabled.
Attachments
Attachments
Issue Links
- relates to
-
NUTCH-2365 HTTP Redirects to SubDomains don't get crawled if db.ignore.external.links.mode == byDomain
- Closed
-
NUTCH-2221 Introduce db.ignore.internal.links to FetcherThread
- Closed
-
NUTCH-2220 Rename db.* options used only by the linkdb to linkdb.*
- Closed