Details
Description
Text inside a textbox, which itself can be in the body, the header or the footer, is not extracted using any type of parser (including AutoDetectParser) in combination with any type of ContentHandler. This is NOT a duplicate of TIKA-904. This specifically concerns the .docx file format.