Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2220

Rename db.* options used only by the linkdb to linkdb.*

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.11
    • 1.12
    • linkdb
    • None
    • Patch Available

    Description

      We need an option db.ignore.internal.links that operates in FetcherThread, just like db.ignore.external.links. It already exists but it only used by the LinkDB, and defaults to true, which is no good option for FetcherThread.

      I propose to make a clear distinction between which are used for LinkDB or not. Most options used by LinkDB already use the right prefix but db.ignore.*.links, db.max.inlinks and db.max.anchor.length not yet.

      This patch will rename those options to linkdb.* prefixes so afterwards we can implement db.ignore.internal.links that operates in FetcherThread, just like db.ignore.external.links.

      This will introduce a change in default parameters. Please comment.

      How to upgrade from earlier releases

      • replace your old conf/nutch-default.xml with the conf/nutch-default.xml from Nutch 1.12 release
      • if you use LinkDB (e.g. invertlinks) and modified parameters db.max.inlinks and/or db.max.anchor.length and/or db.ignore.internal.links, rename those parameters to linkdb.max.inlinks and linkdb.max.anchor.length and linkdb.ignore.internal.links
      • db.ignore.internal.links and db.ignore.external.links now operate on the CrawlDB only
      • linkdb.ignore.internal.links and linkdb.ignore.external.links now operate on the LinkDB only

      Attachments

        1. NUTCH-2220.patch
          4 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: