Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2220

Rename db.* options used only by the linkdb to linkdb.*

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11
    • Fix Version/s: 1.12
    • Component/s: linkdb
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      We need an option db.ignore.internal.links that operates in FetcherThread, just like db.ignore.external.links. It already exists but it only used by the LinkDB, and defaults to true, which is no good option for FetcherThread.

      I propose to make a clear distinction between which are used for LinkDB or not. Most options used by LinkDB already use the right prefix but db.ignore.*.links, db.max.inlinks and db.max.anchor.length not yet.

      This patch will rename those options to linkdb.* prefixes so afterwards we can implement db.ignore.internal.links that operates in FetcherThread, just like db.ignore.external.links.

      This will introduce a change in default parameters. Please comment.

      How to upgrade from earlier releases

      • replace your old conf/nutch-default.xml with the conf/nutch-default.xml from Nutch 1.12 release
      • if you use LinkDB (e.g. invertlinks) and modified parameters db.max.inlinks and/or db.max.anchor.length and/or db.ignore.internal.links, rename those parameters to linkdb.max.inlinks and linkdb.max.anchor.length and linkdb.ignore.internal.links
      • db.ignore.internal.links and db.ignore.external.links now operate on the CrawlDB only
      • linkdb.ignore.internal.links and linkdb.ignore.external.links now operate on the LinkDB only

        Attachments

        1. NUTCH-2220.patch
          4 kB
          Markus Jelsma

          Issue Links

            Activity

              People

              • Assignee:
                markus17 Markus Jelsma
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: