Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2786

Recursive Embedded Word Doc (pre 2006) fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.19.1
    • None
    • cli
    • None
    • Tika CLI

    Description

      An embedded document (.DOC) having another embedded document (.DOC) does not get extracted correctly by Tika CLI (FileEmbeddedDocumentExtractor)

      Only the first level of embedded document opens, any embedded docs within a document given an error in MS Word i.e. "The server application, source file or item cannot be found"

      Test file attached :

      To test this run the following command, and open the extracted document, you will notice another embedded document within it which does not open.

      java -jar tika-app-1.19.1.jar -v -z Recursive-Embedding-test.doc

       

      Attachments

        1. Recursive-Embedding-test.doc
          237 kB
          BALVINDER SINGH DANG

        Activity

          People

            Unassigned Unassigned
            bdang BALVINDER SINGH DANG
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: