Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1483 Can't crawl filesystem with protocol-file plugin
  3. NUTCH-1879

Regex URL normalizer should remove multiple slashes after file: protocol

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.9, 2.2.1
    • 2.3, 1.10
    • protocol
    • None
    • Patch Available

    Description

      urlnormalizer-regex should replace multiple slashes after file: protocol by a single slash (file:/// -> file:/):

      • required by NUTCH-1483 to get a consistent canonical form for file URL because URL.toString() also emits the single-slash form
      • would obsolete NUTCH-1878

      Attachments

        1. NUTCH-1879-v1.patch
          2 kB
          Sebastian Nagel

        Issue Links

          Activity

            People

              Unassigned Unassigned
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: