Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3968

Reconstruct embedded file names from associated emf files within docx files

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0
    • None
    • None

    Description

      I'm starting to see among several users communicating with me privately that Microsoft has changed their basic behavior for files attached to at least docx files (possibly pptx and xlsx?). Rather than storing the original file name, the file associates an EMF file with an attachment. The filename that a human sees in the application is spelled/painted out in the EMF file, but does NOT exist in any of the XML.

      I'm attaching an example file.

      In fixing this issue, I've noticed that some of our fairly old docx files use this technique. Not clear that it is a new thing, just happen to be hearing about it from several people.

      I'd like to thank Chetan Bikire (chetab) for raising this issue and sharing the example document which we've added to our unit tests.

      Attachments

        1. testWORD has attachment.docx
          60 kB
          Tim Allison
        2. Microsoft_Word_Document.docx
          12 kB
          Tim Allison
        3. image1.emf
          10 kB
          Tim Allison
        4. oleObject1.bin
          3 kB
          Tim Allison
        5. image2.emf
          10 kB
          Tim Allison
        6. image3.emf
          10 kB
          Tim Allison
        7. oleObject2.bin
          41 kB
          Tim Allison
        8. image-2023-02-06-15-46-05-678.png
          23 kB
          Ross Johnson
        9. image-2023-02-06-15-58-20-443.png
          34 kB
          Ross Johnson
        10. image1-1.emf
          8 kB
          Ross Johnson
        11. image1-2.emf
          8 kB
          Ross Johnson
        12. Inner Test Email.msg
          40 kB
          ChetanB
        13. symbol.docx
          12 kB
          ChetanB

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: