Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
On TIKA-1994, we added the capability to run OCR on a full page for PDFs instead of the inline images. The initial patch only had three OCR strategies: no_ocr, ocr_only, ocr_and_text. Let's add other strategies that might improve performance (speed/accuracy/redundancy).
Attachments
Issue Links
- is related to
-
TIKA-1994 Integrate OCR with PDFParser
- Resolved