Issue Details (XML | Word | Printable)

Key: NUTCH-273
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Andrzej Bialecki
Reporter: Lukas Vlcek
Votes: 5
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
Nutch

When a page is redirected, the original url is NOT updated.

Created: 20/May/06 04:23 PM   Updated: 28/Dec/06 12:18 AM
Return to search
Component/s: fetcher
Affects Version/s: 0.8
Fix Version/s: 0.9.0

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works Fetcher.java-489586.diff 2006-12-22 09:38 AM Eelco Lempsink 0.6 kB
Environment: n/a
Issue Links:
Dependants
 
Incorporates
 
Reference
 

Resolution Date: 28/Dec/06 12:18 AM


 Description  « Hide
[Excerpt from maillist, sender: Andrzej Bialecki]
When a page is redirected, the original url is NOT updated - so, CrawlDB will never know that a redirect occured, it won't even know that a fetch occured... This looks like a bug.
In 0.7 this was recorded in the segment, and then it would affect the Page status during updatedb. It should do so 0.8, too...

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.