Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As described in TIKA-2360 we should refactor the ObjectRecogniser interface. I propose creating:
1. TextRecogniser (per thammegowda it takes INPUT:text input and OUTPUT:set of metadata key values)
2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR, Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.)
We should of course rectify this with Tika-DL and how that folds in.