Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1174

Outlinks are not properly normalized

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.5
    • Component/s: parser
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      In ParseOutputFormat, the toUrl is read from Outlink and is processed. This String object is filtered, normalized etc but the original Outlink object is actually added. The normalized url in toUrl is not written back to the Outlink object.

      This issue adds a setUrl method to Outlink which is used in ParseOutputFormat to overwrite the unnormalized url.

        Attachments

        1. NUTCH-1174-1.5-1.patch
          1 kB
          Markus Jelsma

          Issue Links

            Activity

              People

              • Assignee:
                markus17 Markus Jelsma
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: