[SOLR-2129] Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.1, 4.0-ALPHA
Component/s: None
Labels:
None

Description

Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.

Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.

More information can be found on the dedicated wiki page: http://wiki.apache.org/solr/SolrUIMA

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

lib-jars.zip
25/Sep/10 09:18
6.80 MB
Tommaso Teofili
SOLR-2129.patch
03/Jan/11 16:44
200 kB
Robert Muir
SOLR-2129.patch
24/Sep/10 12:39
209 kB
Tommaso Teofili
SOLR-2129-asf-headers.patch
24/Sep/10 13:07
225 kB
Tommaso Teofili
SOLR-2129-version2.patch
14/Nov/10 09:37
212 kB
Tommaso Teofili
SOLR-2129-version3.patch
08/Dec/10 17:29
208 kB
Tommaso Teofili
SOLR-2129-version-5.patch
08/Jan/11 16:07
211 kB
Tommaso Teofili
SOLR-2129-version-6.patch
11/Jan/11 09:15
211 kB
Tommaso Teofili

Activity

People

Assignee:: Robert Muir

Reporter:: Tommaso Teofili

Votes:: 6 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 22/Sep/10 05:54

Updated:: 30/Mar/11 15:45

Resolved:: 24/Jan/11 02:22