Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2221

Introduce db.ignore.internal.links to FetcherThread

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.11
    • 1.12
    • fetcher
    • None

    Description

      FetcherThread has support for db.ignore.external.links. In config you can find db.ignore.internal.links as well, but it only operates on LinkDB, which is confusing. This patch will introduce db.ignore.internal.links to FetcherThread, similar to db.ignore.external.links. With both parameter set to true you can limit the crawl to the injected seed list.

      Attachments

        1. NUTCH-2221.patch
          10 kB
          Markus Jelsma
        2. NUTCH-2221.patch
          9 kB
          Markus Jelsma
        3. NUTCH-2216-NUTCH-2220-NUTCH-2221.patch
          13 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: