[SOLR-1763] Integrate Solr Cell/Tika as an UpdateRequestProcessor - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: update
Labels:

Description

From Chris Hostetter's original post in solr-dev:

As someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor – where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields.

Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and adds them as binary data fields and adds the other literal params as fields.

Wouldn't that make things like ~~SOLR-1358~~, and using Tika with URLs/filepaths in XML and CSV based updates fairly trivial?

-Hoss

I couldn't agree more, so I decided to add it as an issue.

Attachments

Issue Links

is related to

SOLR-1526 Client Side Tika integration

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Jan Høydahl

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Feb/10 20:41

Updated:: 28/May/16 21:38

Resolved:: 28/May/16 21:38