Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2069

Ignore external links based on domain

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.10
    • 1.11
    • fetcher, parser
    • None
    • Patch Available

    Description

      We currently have `db.ignore.external.links` which is a nice way of restricting the crawl based on the hostname. This adds a new parameter 'db.ignore.external.links.domain' to do the same based on the domain.

      Attachments

        1. NUTCH-2069.patch
          23 kB
          Julien Nioche
        2. NUTCH-2069.v2.patch
          8 kB
          Julien Nioche

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: