Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-85

HTMLParser can't skip to parse some javascript code

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2
    • None
    • modules/examples
    • None
    • Operating System: other
      Platform: PC

    • 16952

    Description

      I found that org/apache/lucene/demo/html/HTMLParser.jj have a bug. I tried to
      parse the following HTML content:

      <script language="JavaScript">
      function preset() {
      var art_id=GetParamValue("art_id");
      // alert("bbbb"+art_id);
      if(isNaN(art_id) || art_id=="")

      { document.dymenu.article_id.selectedIndex=2; // alert("aaaa"); return; }

      for(var i=1;i<document.dymenu.article_id.options.length;i++)

      { if(document.dymenu.article_id.options[i].value==art_id) // line 625 break; }

      document.dymenu.article_id.selectedIndex=i;
      return;
      }
      preset();
      </script></td></tr><tr><td align=right>
      ++++++++++++++++++++++++++++
      it threw an exception:

      adding ../projecthand/applenews2.html
      Parse Aborted: Lexical error at line 625, column 60. Encountered: "=" (61),
      after : ""

      ++++++++++++++++++++++++++++++

      After i added comment tags "<!-" and "//->" inside "<script>...</script>"
      tags. it worked again. i think the HTMLParser should skip the javascript code
      without comment tags. Also i tried another javascript code block in the same
      file just before the above javascript code block and HTMLParser able to skip
      but still fail to parse the above javascript code block ....

      Attachments

        Activity

          People

            java-dev@lucene.apache.org Lucene Developers
            tommy.cheung@arontac.com Tommy Cheung
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: