Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2302

Make handling of macros equivalent btwn VBA in MSOffice and JS in PDFs

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The current default behavior is to extract VBA macros from MSOffice files but not to extract JS from PDFs. Now that we have a config for MSOffice files, I propose changing the default behavior to NOT extract VBA macros from MSOffice files. Users can opt in to extraction of macros via configuration.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build tika-2.x-windows #186 (See https://builds.apache.org/job/tika-2.x-windows/186/)
          TIKA-2302 – make macro extraction configurable and set default to false (tallison: rev 1826112e6c3bfd4001cef896279263ccbe0a1923)

          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-sax-docx.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/AbstractOfficeParser.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-sax-macros.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-macros.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-dom-macros.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java
          • (edit) CHANGES.txt
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build tika-2.x-windows #186 (See https://builds.apache.org/job/tika-2.x-windows/186/ ) TIKA-2302 – make macro extraction configurable and set default to false (tallison: rev 1826112e6c3bfd4001cef896279263ccbe0a1923) (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-sax-docx.xml (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/AbstractOfficeParser.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-sax-macros.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-macros.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-dom-macros.xml (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java (edit) CHANGES.txt (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build tika-2.x #234 (See https://builds.apache.org/job/tika-2.x/234/)
          TIKA-2302 – make macro extraction configurable and set default to false (tallison: rev 1826112e6c3bfd4001cef896279263ccbe0a1923)

          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-sax-macros.xml
          • (edit) CHANGES.txt
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-macros.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/AbstractOfficeParser.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-dom-macros.xml
          • (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-sax-docx.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x #234 (See https://builds.apache.org/job/tika-2.x/234/ ) TIKA-2302 – make macro extraction configurable and set default to false (tallison: rev 1826112e6c3bfd4001cef896279263ccbe0a1923) (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-sax-macros.xml (edit) CHANGES.txt (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-macros.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/AbstractOfficeParser.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-dom-macros.xml (add) tika-test-resources/src/test/resources/org/apache/tika/parser/microsoft/tika-config-sax-docx.xml (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Tika-trunk #1233 (See https://builds.apache.org/job/Tika-trunk/1233/)
          TIKA-2302 – make extraction of macros optional in OfficeParsers and set (tallison: https://github.com/apache/tika/commit/19c0e916982174da20ee98196db840c7465471eb)

          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
          • (add) tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/tika-config-macros.xml
          • (edit) CHANGES.txt
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (add) tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-sax-macros.xml
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/AbstractOfficeParser.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
          • (add) tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-dom-macros.xml
            TIKA-2302 – make extraction of macros optional in OfficeParsers and set (tallison: https://github.com/apache/tika/commit/5877c4c8702a10a76b6c3ee59fbae7daf3c9b062)
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1233 (See https://builds.apache.org/job/Tika-trunk/1233/ ) TIKA-2302 – make extraction of macros optional in OfficeParsers and set (tallison: https://github.com/apache/tika/commit/19c0e916982174da20ee98196db840c7465471eb ) (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java (add) tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/tika-config-macros.xml (edit) CHANGES.txt (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (add) tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-sax-macros.xml (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/AbstractOfficeParser.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java (add) tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/ooxml/tika-config-dom-macros.xml TIKA-2302 – make extraction of macros optional in OfficeParsers and set (tallison: https://github.com/apache/tika/commit/5877c4c8702a10a76b6c3ee59fbae7daf3c9b062 ) (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Default for both MSOffice and PDF is now not to extract macros/java script.

          Show
          tallison@mitre.org Tim Allison added a comment - Default for both MSOffice and PDF is now not to extract macros/java script.

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@mitre.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development