Description
Some Nutch components, by default, should reject invalid URLs. This was recently discussed in the users mailing list and has affected my work for a while. Although there may be some special-purpose needs to collect invalid URLs, they are not generally useful for crawling.