Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1483 Can't crawl filesystem with protocol-file plugin
  3. NUTCH-1879

Regex URL normalizer should remove multiple slashes after file: protocol

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.9, 2.2.1
    • 2.3, 1.10
    • protocol
    • None
    • Patch Available

    Description

      urlnormalizer-regex should replace multiple slashes after file: protocol by a single slash (file:/// -> file:/):

      • required by NUTCH-1483 to get a consistent canonical form for file URL because URL.toString() also emits the single-slash form
      • would obsolete NUTCH-1878

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            snagel Sebastian Nagel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment