Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-873

Tika --extract fails for DOC

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: general
    • Labels:
      None
    • Environment:

      Windows 7 + Java v1.6

      Description

      A file that is embedded in an DOCfile doesn't get extracted to disk.

      To "embed" a file into an DOC, simply drag-drop it into an DOC document when using MS-Word 2010. It will then create an EMF of the embedded file's preview.

      See attached file "embedded.doc" for an example input file that fails with Tika v1.0.

        Attachments

        1. embedded.doc
          1.63 MB
          Albert L.

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              albertlaw Albert L.
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: