Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-952

fix outlink which started with '?' in html parser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • nutchgora
    • None
    • parser
    • None
    • Patch Available

    Description

      <a href="?w=ruby%20on%20rails&ty=c&sd=0" >ruby on rails</a>(a snippet from http://bbs.soso.com/search?ty=c&sd=0&w=rails)

      outlink parsed from above link: http://bbs.soso.com/?w=ruby%20on%20rails&ty=c&sd=0
      but expected is http://bbs.soso.com/search?w=ruby%20on%20rails&ty=c&sd=0

      Attachments

        1. NUTCH-952-v2.patch
          2 kB
          Stondet
        2. test_nutch_952.html
          0.2 kB
          Sebastian Nagel

        Issue Links

          Activity

            People

              Unassigned Unassigned
              store88 Stondet
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: