Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1346

Follow outlinks to ignore external

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5
    • 1.6
    • fetcher
    • None
    • Patch Available

    Description

      The follow outlinks feature already respects the db.ignore.external.links setting. However, this means that outlinks of fetched pages that are external are not saved in parse data. There should be a new setting to prevent the outlink follower from going external but still storing external outlinks.

      Attachments

        1. NUTCH-1346-1.6-1.patch
          3 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: