Description
Extend URL Normalizer to allow for normalizion of the Hostname during Generate. By default no rules are applied.
In short, this allows foo.bar.com, bif.baz.bar.com and bar.com to be counted as being the same for generate.max.per.host if an appropriate regex is used.
Add "urlnormalizer-regex" to plugin.includes in nutch-site.xml in order to enable it.
Since several modules now extend the urlnormalizer base we use a "scope" parameter within plugin.xml to allow differentiation between the various urlnormalizer modules to select the right module for Generate.