Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.13
-
None
-
None
-
Windows 7 x64, JVM 1.8.0_101
Description
The following valid Word file:
https://dl.dropboxusercontent.com/u/92341073/VTEU_ICPD_Bacteremia_Concept_Submitted_23Jul08.doc
when parsed by Tika, throws the following error:
java.lang.IllegalArgumentException: This paragraph is not the first one in the table
at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:925)
at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:241)
at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:227)
at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:162)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
Attachments
Issue Links
- is duplicated by
-
TIKA-1733 RuntimeException when parsing some word (.doc) documents
- Open