Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1044

Can't parse Word files with no format set

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.0
    • 1.3
    • parser
    • None

    Description

      When we were using Solr for indexing we came over this Tika bug.
      While parsing a doc or docx file that contains text without any format set (format inside Microsoft Word) the parser will throw exceptions.
      By setting a format to the text the file can be correctly parsed without unexpected errors.

      Attachments

        1. test.docx
          36 kB
          Jonas Wilhelmsson
        2. test2.doc
          73 kB
          Jonas Wilhelmsson

        Activity

          People

            Unassigned Unassigned
            jonas.wilhelmsson Jonas Wilhelmsson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: