Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-324

db.score.link.internal and db.score.link.external are ignored

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.8
    • fetcher
    • None

    Description

      Configuration properties db.score.link.external and db.score.link.internal are ignored.
      In case of e.g. message board webpages or pages that have larger navigation menus on each page having a lower impact of internal links makes a lot of sense for scoring.
      Also for web spam this is a serious problem, since now spammers can setup just one domain with dynamically generated pages and this highly manipulate the nutch scores.
      So I also suggest that we give db.score.link.internal by default a value of something like 0.25.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              joa23 Stefan Groschupf
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: