Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.8.1, 0.9.0
-
None
-
None
Description
Pages that do a serverside forward are not written with a status change back into the crawlDb. Also the nextFetchTime is not changed.
This causes a refetch of the same page again and again. The result is nutch is not polite and refetching the forwarding and target page in each segment iteration. Also it effects the scoring since the forward page contribute it's score to all outlinks.
Attachments
Attachments
Issue Links
- depends upon
-
NUTCH-273 When a page is redirected, the original url is NOT updated.
- Closed