Solr
  1. Solr
  2. SOLR-2129

Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
      The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.

      Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
      The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.

      More information can be found on the dedicated wiki page: http://wiki.apache.org/solr/SolrUIMA

      1. SOLR-2129-version-6.patch
        211 kB
        Tommaso Teofili
      2. SOLR-2129-version-5.patch
        211 kB
        Tommaso Teofili
      3. SOLR-2129-version3.patch
        208 kB
        Tommaso Teofili
      4. SOLR-2129-version2.patch
        212 kB
        Tommaso Teofili
      5. SOLR-2129-asf-headers.patch
        225 kB
        Tommaso Teofili
      6. SOLR-2129.patch
        209 kB
        Tommaso Teofili
      7. SOLR-2129.patch
        200 kB
        Robert Muir
      8. lib-jars.zip
        6.80 MB
        Tommaso Teofili

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Tommaso Teofili
          • Votes:
            6 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development