Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-349

HtmlParser's http-equiv code needs to be more flexible

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.6
    • 0.6
    • None
    • None

    Description

      Some http-equiv meta tags in HTML documents have charset attributes that currently aren't handled properly.

      For example, <meta http-equiv="content-type" content="text/html; charset=utf-8; charset=UTF-8">

      Or where content="text/html;; charset="utf-8" (note double semi-colons)

      The parsing code needs to be more flexible to handle these edge cases.

      Attachments

        1. TIKA-349.patch
          5 kB
          Kenneth William Krugler

        Activity

          People

            jukkaz Jukka Zitting
            kkrugler Kenneth William Krugler
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: