Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1990

Use URI.normalise() in BasicURLNormalizer

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9
    • Fix Version/s: 1.10, 2.3.1
    • Component/s: None
    • Labels:
      None

      Description

      One of the things that BasicURLNormalizer is to remove unnecessary dot segments in path.

      Instead of implementing the logic ourselves with some antiquated regex library, we should simply use http://docs.oracle.com/javase/7/docs/api/java/net/URI.html#normalize() which does the same and is probably more efficient.

        Attachments

        1. NUTCH-1990-trial1.patch
          3 kB
          Sebastian Nagel
        2. NUTCH-1990-v1.patch
          10 kB
          Sebastian Nagel

          Activity

            People

            • Assignee:
              jnioche Julien Nioche
              Reporter:
              jnioche Julien Nioche
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: