Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2216

db.ignore.*.links to optionally follow internal redirects

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.11
    • 1.14
    • fetcher
    • None

    Description

      db.ignore.internal.links doesn't follow any internal hyperlinks or redirects. Together with db.ignore.external.links it helps to restrict the crawl to a predefined set of URL's, for example provided by a customer.

      In many cases, a few of those URL's are redirects, which are not followed. This issue adds an option to optionally allow internal redirects despite db.ignore.internal.links being enabled.

      Attachments

        1. NUTCH-2216.patch
          4 kB
          Markus Jelsma
        2. NUTCH-2216.patch
          7 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: