Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2065

Domain URL filter to support protocols

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.10
    • None
    • None
    • None
    • Patch Available

    Description

      The filter allows all protocols for all whitelisted domains, hosts or suffixes but it usually makes little sense to index both http and https URL's of the same domain. This is not unlike the host URL filter, which prevents indexing of duplicate hosts e.g. apache.org and www.apache.org.

      Attachments

        1. NUTCH-2065.patch
          5 kB
          Markus Jelsma
        2. NUTCH-2065.patch
          7 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              Unassigned Unassigned
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: