Issue Details (XML | Word | Printable)

Key: NUTCH-324
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Unassigned
Reporter: Stefan Groschupf
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

db.score.link.internal and db.score.link.external are ignored

Created: 19/Jul/06 11:47 PM   Updated: 01/Aug/06 05:00 PM
Return to search
Component/s: fetcher
Affects Version/s: None
Fix Version/s: 0.8

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works InternalAndExternalLinkScoreFactor.patch 2006-07-19 11:53 PM Stefan Groschupf 2 kB
Issue Links:
Duplicate
 

Resolution Date: 24/Jul/06 03:26 PM


 Description  « Hide
Configuration properties db.score.link.external and db.score.link.internal are ignored.
In case of e.g. message board webpages or pages that have larger navigation menus on each page having a lower impact of internal links makes a lot of sense for scoring.
Also for web spam this is a serious problem, since now spammers can setup just one domain with dynamically generated pages and this highly manipulate the nutch scores.
So I also suggest that we give db.score.link.internal by default a value of something like 0.25.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Stefan Groschupf added a comment - 19/Jul/06 11:53 PM
Multiply the score of a page during distributeScoreToOutlink with db.score.link.internal or db.score.link.external.

Andrzej Bialecki added a comment - 24/Jul/06 03:26 PM
Patch applied, with minor whitespace diffs and doc. clarifications. Thank you!