[NUTCH-353] pages that serverside forwards will be refetched every time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.1, 0.9.0
Fix Version/s: 1.0.0
Component/s: None
Labels:
None

Description

Pages that do a serverside forward are not written with a status change back into the crawlDb. Also the nextFetchTime is not changed.
This causes a refetch of the same page again and again. The result is nutch is not polite and refetching the forwarding and target page in each segment iteration. Also it effects the scoring since the forward page contribute it's score to all outlinks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

doNotRefecthForwarderPagesV1.patch
18/Aug/06 04:50
0.7 kB
Stefan Groschupf

Issue Links

depends upon

NUTCH-273 When a page is redirected, the original url is NOT updated.

Closed

Activity

People

Assignee:: Andrzej Bialecki

Reporter:: Stefan Groschupf

Votes:: 3 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Aug/06 04:50

Updated:: 02/May/13 02:28

Resolved:: 03/Feb/09 13:19