Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Tika should not be run in the same jvm as Solr. Ever.
Upgrading Tika and hoping to avoid jar hell, while getting all of the dependencies right manually is, um, error prone. See my recent failure: SOLR-11622, for which I apologize profusely.
Running DIH against Tika's unit test documents has been eye-opening. It has revealed some other version conflict/dependency failures that should have been caught much earlier.
The fix is non-trivial, but we should work towards it.
I see two options:
1. TIKA-2514 – Our current ForkParser offers a model for a minimal fork process + server option. The limitation currently is that all parsers and dependencies must be serializable, which can be a problem for users adding their own parsers with deps that might not be designed for serializability. The proposal there is to rework the ForkParser to use a TIKA_HOME directory for all dependencies.
2. SOLR-7632 – use tika-server, but make it seamless and as easy (and secure!) to use as the current handlers.
Other thoughts, recommendations?
Attachments
Issue Links
- is related to
-
SOLR-12423 Upgrade to Tika 1.19.1 when available
- Closed
- relates to
-
SOLR-13973 Deprecate Tika
- Open
-
SOLR-7632 Change the ExtractingRequestHandler to use Tika-Server
- Reopened