Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2246

Extract files embedded within EMF files

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.15
    • Component/s: None
    • Labels:
      None

      Description

      Andreas Beeker recently added code to POI to extract PDFs that were embedded inside EMF files. On POI-60570, I recently added a parser for EMFs that will also accomplish this. Once we upgrade the next version of POI, let's add extraction of file objects embedded within EMFs.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Jenkins build Tika-trunk #1197 (See https://builds.apache.org/job/Tika-trunk/1197/)
          TIKA-2247 and TIKA-2246 – add parsers for EMF/WMF (tallison: rev b9befb4272cf8b2bda3b3ea25b0511bbabfdeded)

          • (edit) CHANGES.txt
          • (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xlsx
          • (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xlsx
          • (edit) tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
          • (add) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/EMFParserTest.java
          • (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xls
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WMFParser.java
          • (add) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WMFParserTest.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/EMFParser.java
          • (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xls
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Tika-trunk #1197 (See https://builds.apache.org/job/Tika-trunk/1197/ ) TIKA-2247 and TIKA-2246 – add parsers for EMF/WMF (tallison: rev b9befb4272cf8b2bda3b3ea25b0511bbabfdeded) (edit) CHANGES.txt (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xlsx (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xlsx (edit) tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser (add) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/EMFParserTest.java (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xls (add) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WMFParser.java (add) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WMFParserTest.java (add) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/EMFParser.java (add) tika-parsers/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xls (edit) tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build tika-2.x #214 (See https://builds.apache.org/job/tika-2.x/214/)
          TIKA-2246 and TIKA-2247 -add parsers for EMF and WMF (tallison: rev 6bfe5d565bd3fbf55a538c39047294814cae0767)

          • (add) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WMFParserTest.java
          • (edit) tika-parser-bundles/tika-parser-office-bundle/src/test/java/org/apache/tika/module/office/BundleIT.java
          • (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xlsx
          • (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xls
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/WMFParser.java
          • (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xls
          • (edit) CHANGES.txt
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/EMFParser.java
          • (add) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/EMFParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java
          • (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xlsx
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x #214 (See https://builds.apache.org/job/tika-2.x/214/ ) TIKA-2246 and TIKA-2247 -add parsers for EMF and WMF (tallison: rev 6bfe5d565bd3fbf55a538c39047294814cae0767) (add) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WMFParserTest.java (edit) tika-parser-bundles/tika-parser-office-bundle/src/test/java/org/apache/tika/module/office/BundleIT.java (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xlsx (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xls (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/WMFParser.java (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xls (edit) CHANGES.txt (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/EMFParser.java (add) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/EMFParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_embeddedPDF_windows.xlsx (edit) tika-parser-modules/tika-parser-office-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Many thanks to Andreas Beeker for showing that EMFs can contain embedded files!

          Show
          tallison@mitre.org Tim Allison added a comment - Many thanks to Andreas Beeker for showing that EMFs can contain embedded files!

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@mitre.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development