[SOLR-987] Add a new DataImportHandler EntityProcessor to handle non-XML files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: contrib - DataImportHandler
Labels:
None

Description

Need a way to use Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)? This would assist in putting the entire contents of a text file into a single field of a document for which the other fields are being pulled out of another DataSource. An EntityProcessor looks like the right place for this as it may help us add more attributes if needed. We could also consider support for other file formats (PDF, office, etc.), which may overlap with some of the Extraction/Tika work.

Attachments

Issue Links

is a clone of

SOLR-980 A PlainTextEntityProcessor for DIH

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Nathan Adams

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 26/Jan/09 17:12

Updated:: 27/Jan/09 15:33

Resolved:: 27/Jan/09 15:32