Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2859

urlnormalizer-protocol: allow to normalize domains

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.18
    • 1.19
    • plugin, urlnormalizer
    • None
    • Patch Available

    Description

      The plugin urlnormalizer-protocol normalizes the URL protocol/scheme for a given list of hosts to the desired "normal" protocol (usually one of http or https). It would be handy to allow to specify domain names as well, so that all hosts/subdomains in a given domain are normalized.

      In order to stay backward-compatible this could be done by matching *.example.org as a pattern for all hosts or subdomains of the domain example.org.

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: