Description
In my current design plan, I see creating a separate component "tika-batch" that includes a small bit of configurable code to run Tika against a large batch of documents. This code should be robust against OOM and hangs, and it should have fairly robust logging.
Attachments
Attachments
Issue Links
- is related to
-
OAK-2953 Implement text extractor as part of oak-run
- Closed