Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
5.3
-
None
Description
Actually there is no possibility to hand over some additional configuration by document extracting with ExtractingRequestHandler/ExtractingDocumentLoader.
For example I need to put org.apache.tika.parser.pdf.PDFParserConfig with "extractInlineImages" set to true in ParseContext to trigger extraction/OCR recognizing of embedded images from pdf.
It would be nice to have possibility to configure created ParseContext due xml-config file like TikaConfig does.
I would suggest to have following:
solrconfig.xml:
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<str name="parseContext.config">parseContext.config</str>
</requestHandler>
parseContext.config:
<entries>
<entry class="org.apache.tika.parser.pdf.PDFParserConfig" value="org.apache.tika.parser.pdf.PDFParserConfig">
<property name="extractInlineImages" value="true"/>
</entry>
</entries>