Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.12
-
None
-
None
-
windows, java version "1.8.0_73", 64 bit
Description
Hi everybody!
I'm trying to white-list a particular mime-type for OCR with the following config:
<properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/> </parser> <parser class="org.apache.tika.parser.pdf.PDFParser"> <mime>application/pdf</mime> </parser> </parsers> </properties>
So, the idea is - to enable the Tesseract parser for PDF format only.
But this configuration disables the Tesseract completely.
Is it the expected behaviour or a bug?
Thank you!