Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1174

Outlinks are not properly normalized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3
    • 1.5
    • parser
    • None
    • Patch Available

    Description

      In ParseOutputFormat, the toUrl is read from Outlink and is processed. This String object is filtered, normalized etc but the original Outlink object is actually added. The normalized url in toUrl is not written back to the Outlink object.

      This issue adds a setUrl method to Outlink which is used in ParseOutputFormat to overwrite the unnormalized url.

      Attachments

        1. NUTCH-1174-1.5-1.patch
          1 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: