Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-873

Tika --extract fails for DOC

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0
    • 1.2
    • general
    • None
    • Windows 7 + Java v1.6

    Description

      A file that is embedded in an DOCfile doesn't get extracted to disk.

      To "embed" a file into an DOC, simply drag-drop it into an DOC document when using MS-Word 2010. It will then create an EMF of the embedded file's preview.

      See attached file "embedded.doc" for an example input file that fails with Tika v1.0.

      Attachments

        1. embedded.doc
          1.63 MB
          Albert L.

        Activity

          People

            Unassigned Unassigned
            albertlaw Albert L.
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: