Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2090

Extract javascript from PDActions in PDFs

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.15
    • Component/s: parser
    • Labels:
      None

      Description

      We're now extracting macros from msoffice files (TIKA-2069). We should do the equivalent for PDFs.

        Issue Links

          Activity

          Show
          tallison@mitre.org Tim Allison added a comment - How hard could it be? http://stackoverflow.com/questions/34840299/finding-javascript-code-in-pdf-using-apache-pdfbox
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Fixed for extraction from common locations.

          Show
          tallison@mitre.org Tim Allison added a comment - Fixed for extraction from common locations.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build tika-2.x-windows #81 (See https://builds.apache.org/job/tika-2.x-windows/81/)
          TIKA-2090: Allow extraction of PDActions (including Javascript) from (tallison: rev 300100fcb9a39e8997764e0d3b3ecd0d213c7824)

          • (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
          • (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
          • (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java
          • (edit) tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
          • (edit) CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build tika-2.x-windows #81 (See https://builds.apache.org/job/tika-2.x-windows/81/ ) TIKA-2090 : Allow extraction of PDActions (including Javascript) from (tallison: rev 300100fcb9a39e8997764e0d3b3ecd0d213c7824) (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java (edit) tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java (edit) CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Tika-trunk #1149 (See https://builds.apache.org/job/Tika-trunk/1149/)
          TIKA-2090 – first draft (tallison: rev 7fbf0f304d8e20f7e26baadee1f85974b03dee8e)

          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
          • (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
            TIKA-2090 – add more areas where javascript might live and add ability (tallison: rev 4dd6fd11035c09070689471975cc661aafa77333)
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
          • (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
            TIKA-2090 – add ability to extract PDActions from PDF files (tallison: rev 99b59243756d08124497686642d559f31d549543)
          • (edit) CHANGES.txt
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1149 (See https://builds.apache.org/job/Tika-trunk/1149/ ) TIKA-2090 – first draft (tallison: rev 7fbf0f304d8e20f7e26baadee1f85974b03dee8e) (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java TIKA-2090 – add more areas where javascript might live and add ability (tallison: rev 4dd6fd11035c09070689471975cc661aafa77333) (edit) tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java TIKA-2090 – add ability to extract PDActions from PDF files (tallison: rev 99b59243756d08124497686642d559f31d549543) (edit) CHANGES.txt (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build tika-2.x #180 (See https://builds.apache.org/job/tika-2.x/180/)
          TIKA-2090: Allow extraction of PDActions (including Javascript) from (tallison: rev 300100fcb9a39e8997764e0d3b3ecd0d213c7824)

          • (edit) tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
          • (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java
          • (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
          • (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
          • (edit) CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x #180 (See https://builds.apache.org/job/tika-2.x/180/ ) TIKA-2090 : Allow extraction of PDActions (including Javascript) from (tallison: rev 300100fcb9a39e8997764e0d3b3ecd0d213c7824) (edit) tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java (edit) CHANGES.txt

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@mitre.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development