Description
A zero-byte (empty) file with a .doc extension is detected as a Word Document and the OfficeParser.parse method is called for this file.
We then get a TikaException, with the cause given as an org.apache.poi.EmptyFileException.
I think it would be more useful if the file were NOT detected as a Word Document, meaning that the AutoDetectParser would then fall back to whatever is set as the fallback parser in the parse context.
This is more useful because the user can then trigger some special logic for handling empty files.
Attachments
Issue Links
- is related to
-
JCR-4240 IndexingQueueTest relies on Tika behavior that is changed in Tika 1.17
- Closed