Details
Description
If fetcher follows redirect (http.redirect.max > 0), it may happen that many redirects of a site point to the same URL. In this situation, it might be good if fetcher could temporarily (for a configurable time period) deduplicate the redirect targets and skip all redirects except the first one. Typical examples of duplicated redirect targets are:
- instead of responding with HTTP status 404:
/ /resource-not-found /search/ /404 /error/not-found /err/notfound.html
- a page to accept/decline cookies
/cookie_usage.php
Attachments
Issue Links
- duplicates
-
NUTCH-1150 http.redirect.max can lead to multiple parses of the same url
- Closed
- links to