Bug 54725 - NullPointerException parsing ms doc file
Summary: NullPointerException parsing ms doc file
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-19 15:45 UTC by Martin Kalcher
Modified: 2014-07-31 15:42 UTC (History)
0 users



Attachments
Tika failed to parse this doc file. (233.50 KB, application/msword)
2013-03-19 15:45 UTC, Martin Kalcher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Kalcher 2013-03-19 15:45:22 UTC
Created attachment 30079 [details]
Tika failed to parse this doc file.

I am comming here from: https://issues.apache.org/jira/browse/TIKA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605364#comment-13605364

I get a NullPointerException parsing a ms doc file using tika.

% java -Djava.awt.headless=false -jar tika-app-1.3.jar -t < test.doc       
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2443906f
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:139)
	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:400)
	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112)
Caused by: java.lang.NullPointerException
	at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:48)
	at org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
	at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:79)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)

The test.doc file is attached.
Comment 1 Nick Burch 2014-07-31 15:42:04 UTC
Fixed in r1614926.