Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3968

Reconstruct embedded file names from associated emf files within docx files

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0
    • None
    • None

    Description

      I'm starting to see among several users communicating with me privately that Microsoft has changed their basic behavior for files attached to at least docx files (possibly pptx and xlsx?). Rather than storing the original file name, the file associates an EMF file with an attachment. The filename that a human sees in the application is spelled/painted out in the EMF file, but does NOT exist in any of the XML.

      I'm attaching an example file.

      In fixing this issue, I've noticed that some of our fairly old docx files use this technique. Not clear that it is a new thing, just happen to be hearing about it from several people.

      I'd like to thank Chetan Bikire (ChetanB) for raising this issue and sharing the example document which we've added to our unit tests.

      Attachments

        1. testWORD has attachment.docx
          60 kB
          Tim Allison
        2. symbol.docx
          12 kB
          ChetanB
        3. oleObject2.bin
          41 kB
          Tim Allison
        4. oleObject1.bin
          3 kB
          Tim Allison
        5. Microsoft_Word_Document.docx
          12 kB
          Tim Allison
        6. Inner Test Email.msg
          40 kB
          ChetanB
        7. image3.emf
          10 kB
          Tim Allison
        8. image-2023-02-06-15-58-20-443.png
          34 kB
          Ross Johnson
        9. image-2023-02-06-15-46-05-678.png
          23 kB
          Ross Johnson
        10. image2.emf
          10 kB
          Tim Allison
        11. image1-2.emf
          8 kB
          Ross Johnson
        12. image1-1.emf
          8 kB
          Ross Johnson
        13. image1.emf
          10 kB
          Tim Allison

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment