Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-190

[PATCH] Demo HTML parser does not properly handle meta tag attributes.

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • modules/examples
    • None
    • Operating System: All
      Platform: All

    • 27423

    Description

      Version 1.3final.

      The meta tag parsing in the demo HTML parser
      (demo/org/apache/lucene/demo/html/HTMLParser.jj) incorrectly relies on the meta
      tag's "name" attribute coming before its "content" attribute. In XML/HTML,
      attribute order is supposed to be insignificant.

      So, if I have tags:

      <meta content="blah" name="blarg" />
      <meta content="gluh" name="glarg" />

      ...the parser will not parse them correctly. (In fact, it will simply fill in
      name/content pairs as it encounters attributes in the stream, without regard to
      which meta tags the attributes are actually in. So, in the above example, I will
      get one meta property of "blarg"="gluh".)

      This is a problem because my XSLT happens to result in meta tags with attributes
      in the above order.

      It may not seem like a big deal since it's in demo code, but because
      HTMLParser.jj is many times faster than more heavy-weight solutions, I'd love
      for this to be fixed, if possible.

      Attachments

        Activity

          People

            java-dev@lucene.apache.org Lucene Developers
            matt@sidefx.com Matt Chaput
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: