Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2065

Domain URL filter to support protocols

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.10
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The filter allows all protocols for all whitelisted domains, hosts or suffixes but it usually makes little sense to index both http and https URL's of the same domain. This is not unlike the host URL filter, which prevents indexing of duplicate hosts e.g. apache.org and www.apache.org.

        Attachments

        1. NUTCH-2065.patch
          5 kB
          Markus Jelsma
        2. NUTCH-2065.patch
          7 kB
          Markus Jelsma

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: