Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2858

urlnormalizer-protocol: URL port is lost during normalization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.18
    • 1.19
    • plugin, urlnormalizer
    • None
    • Patch Available

    Description

      If a URL includes a port, e.g. http://example.com:8080/ or https://example.com:8443/, the port is removed when normalizing using the protocol-urlnormalizer.

      Instead, if the port is set,

      • the port should be kept as is and
      • the protocol should be unchanged
        • keeping the port and changing the protocol might result in a connection failure
        • unlike the default port mappings (80 (http) <> 443 (https)), non-default port mappings (8080 <> 8443) are risky and unlikely to work on every server setup.

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: