Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2148

Tika app is unable to parse a password protected PowerPoint (97-2003) document

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.13
    • None
    • cli
    • Windows console.

    Description

      Using the Tika command-line application to extract text from a PowerPoint 97-2003 document fails. Here's the basic command that was used:

      java -jar tika-app-1.13.jar -t --password=password "This is password protected (Created with MS 2003).ppt"

      The following exception is thrown on the console:

      Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@62204612
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
      	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
      	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
      Caused by: org.apache.poi.hslf.exceptions.EncryptedPowerPointFileException: PowerPoint file is encrypted. The correct password needs to be set via Biff8EncryptionKey.setCurrentUserPassword()
      	at org.apache.poi.hslf.usermodel.HSLFSlideShowEncrypted.<init>(HSLFSlideShowEncrypted.java:106)
      	at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:284)
      	at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:275)
      	at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:179)
      	at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:182)
      	at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
      	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
      	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	... 5 more
      

      Note that this happens with a PPT file that is created using Office 2010, Office 2007, or Office 2003.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              t3knoid Frank Refol
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: