Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-914

Invalid self-closing title tag when parsing an RTF file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 1.1
    • None
    • parser
    • Reproduced on Linux and Windows

    Description

      When parsing an RTF file with an empty TITLE metadata, the resulting HTML contains an self-closing title tag:

      $ java -jar tika-app-1.1.jar -h test.rtf
      <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <meta name="Content-Length" content="830468"/>
      <meta name="Content-Type" content="application/rtf"/>
      <meta name="resourceName" content="test.rtf"/>
      <title/>
      </head>
      [...]
      

      I believe self-closing tags are not valid in XHTML, according to http://www.w3.org/TR/xhtml1/#C_3 (However there's no XHTML doctype generated here, just a namespace...). Anyway this causes some browsers like Chrome to fail parsing the HTML, resulting in a blank page displayed.

      The expected output would be a non self-closing empty tag: <title></title>

      Attachments

        1. test.rtf
          811 kB
          Nicolas Guillaumin

        Issue Links

          Activity

            People

              rgauss Ray Gauss II
              nguillaumin Nicolas Guillaumin
              Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: