Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2221

Introduce db.ignore.internal.links to FetcherThread

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11
    • Fix Version/s: 1.12
    • Component/s: fetcher
    • Labels:
      None

      Description

      FetcherThread has support for db.ignore.external.links. In config you can find db.ignore.internal.links as well, but it only operates on LinkDB, which is confusing. This patch will introduce db.ignore.internal.links to FetcherThread, similar to db.ignore.external.links. With both parameter set to true you can limit the crawl to the injected seed list.

        Attachments

        1. NUTCH-2221.patch
          10 kB
          Markus Jelsma
        2. NUTCH-2221.patch
          9 kB
          Markus Jelsma
        3. NUTCH-2216-NUTCH-2220-NUTCH-2221.patch
          13 kB
          Markus Jelsma

          Issue Links

            Activity

              People

              • Assignee:
                markus17 Markus Jelsma
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: