Tika
  1. Tika
  2. TIKA-895

Empty title element makes Tika-generated HTML documents not open

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.3
    • Component/s: metadata
    • Labels:
    • Environment:

      Windows 7

      Description

      I try to transform an empty docx to an html file.
      Ex : java -jar tika-app-1.1.jar -x example.docx > t.html

      The html file can't be open with Firefox,Internet Explorer and Chrome.

      The main point is that <title/> seems to be forbiden by html specification (can't get the point on html5)

      http://www.w3.org/TR/html401/struct/global.html#h-7.4.2

      7.4.2 The TITLE element

      <!-- The TITLE element is not considered part of the flow of text.

      It should be displayed, for example as the page header or

      window title. Exactly one title is required per document.

      -->

      <!ELEMENT TITLE <http://www.w3.org/TR/html401/struct/global.html#edef-TITLE> - - (#PCDATA) -(%head.misc;

      <http://www.w3.org/TR/html401/sgml/dtd.html#head.misc> ) – document title -->

      <!ATTLIST TITLE %i18n <http://www.w3.org/TR/html401/sgml/dtd.html#i18n> >

      Start tag: required, End tag: required

      For information there was the same bug with xls
      https://issues.apache.org/jira/browse/TIKA-725

      The simple solution should be to provide an empty title by default

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Ray Gauss II
              Reporter:
              Benoit MAGGI
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development