Created attachment 30079 [details] Tika failed to parse this doc file. I am comming here from: https://issues.apache.org/jira/browse/TIKA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605364#comment-13605364 I get a NullPointerException parsing a ms doc file using tika. % java -Djava.awt.headless=false -jar tika-app-1.3.jar -t < test.doc Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2443906f at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:139) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:400) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112) Caused by: java.lang.NullPointerException at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:48) at org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288) at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121) at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:79) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) The test.doc file is attached.
Fixed in r1614926.