Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9.0
    • Component/s: fetcher
    • Labels:
      None

      Description

      Extend URL Normalizer to allow for normalizion of the Hostname during Generate. By default no rules are applied.

      In short, this allows foo.bar.com, bif.baz.bar.com and bar.com to be counted as being the same for generate.max.per.host if an appropriate regex is used.

      Add "urlnormalizer-regex" to plugin.includes in nutch-site.xml in order to enable it.

      Since several modules now extend the urlnormalizer base we use a "scope" parameter within plugin.xml to allow differentiation between the various urlnormalizer modules to select the right module for Generate.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rbt Rod Taylor
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: