Details
Description
In some WORD (.doc, .docx) documents the XHTML elements are not closed properly. This usually happens when there are link elements (<a>) as well as italic or bold elements (<b><i>).
Fix should be done in https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java