Details
Description
Many websites produce faulty URL's with multiple slashes e.g. http://cocoon.apache.org///////////////////////1.x/dynamic.html
This can be really nasty if the number of slashes varies, resulting in many URL's actually pointing to the same page and generating new (unique) URL's to the same or other duplicate pages.
Attachments
Attachments
Issue Links
- is blocked by
-
NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO to java.util.regex
- Closed