Extend URL Normalizer to allow for normalizion of the Hostname during Generate. By default no rules are applied.
In short, this allows foo.bar.com, bif.baz.bar.com and bar.com to be counted as being the same for generate.max.per.host if an appropriate regex is used.
Add "urlnormalizer-regex" to plugin.includes in nutch-site.xml in order to enable it.
Since several modules now extend the urlnormalizer base we use a "scope" parameter within plugin.xml to allow differentiation between the various urlnormalizer modules to select the right module for Generate.
NUTCH-365in trunk. Changes were too intrusive to be ported to branch-0.8, although the patch inNUTCH-365should apply more or less cleanly.