Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1346

Follow outlinks to ignore external

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5
    • Fix Version/s: 1.6
    • Component/s: fetcher
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The follow outlinks feature already respects the db.ignore.external.links setting. However, this means that outlinks of fetched pages that are external are not saved in parse data. There should be a new setting to prevent the outlink follower from going external but still storing external outlinks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                markus17 Markus Jelsma
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: