Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1011

Normalize duplicate slashes in URL's

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.4, nutchgora
    • 1.4, nutchgora
    • None
    • None
    • Patch Available

    Description

      Many websites produce faulty URL's with multiple slashes e.g. http://cocoon.apache.org///////////////////////1.x/dynamic.html
      This can be really nasty if the number of slashes varies, resulting in many URL's actually pointing to the same page and generating new (unique) URL's to the same or other duplicate pages.

      Attachments

        1. NUTCH-1011-all-3.patch
          0.4 kB
          Markus Jelsma
        2. NUTCH-1011-1.4-2.patch
          1 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: