Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2074

Javascript link not parsed by JSParseFilter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.10
    • None
    • fetcher
    • None

    Description

      JSParseFilter can't extract properly this link :

      javascript:tb_show('','http://dummy.url/3S/FRA/contenus/ext/endeca/html/dummy-page.html?TB_iframe=true&height=310&width=600','');

      I have run a junit test in debug mode and it seems that the regular expression JSParseFilter.STRING_PATTERN matches ',' only, and doesn't extract the url.
      As I'm not the best in regular expressions, I can't propose a patch..

      The complete html element is :
      <a class="last" href="javascript:tb_show('','http://dummy.url/3S/FRA/contenus/ext/endeca/html/dummy-page.html?TB_iframe=true&height=310&width=600','');">Dummy url</a>

      Attachments

        Activity

          People

            Unassigned Unassigned
            shadjiat Hadjiat Souad
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: