Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.13
-
None
-
Windows console.
Description
Using the Tika command-line application to extract text from a PowerPoint 97-2003 document fails. Here's the basic command that was used:
java -jar tika-app-1.13.jar -t --password=password "This is password protected (Created with MS 2003).ppt"
The following exception is thrown on the console:
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@62204612 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145) Caused by: org.apache.poi.hslf.exceptions.EncryptedPowerPointFileException: PowerPoint file is encrypted. The correct password needs to be set via Biff8EncryptionKey.setCurrentUserPassword() at org.apache.poi.hslf.usermodel.HSLFSlideShowEncrypted.<init>(HSLFSlideShowEncrypted.java:106) at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:284) at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:275) at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:179) at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:182) at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more
Note that this happens with a PPT file that is created using Office 2010, Office 2007, or Office 2003.
Attachments
Attachments
Issue Links
- is related to
-
TIKA-1761 Error Parsing PPT (97-2003) files with password protection against modification which were created using Office 2013
- Open