Lucene - Core
  1. Lucene - Core
  2. LUCENE-2899

Add OpenNLP Analysis capabilities as a module

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, Trunk
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does:

      • Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens)
      • NamedEntity recognition as a TokenFilter

      We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position.

      I'd propose it go under:
      modules/analysis/opennlp

      1. LUCENE-2899.patch
        247 kB
        Lance Norskog
      2. LUCENE-2899-RJN.patch
        317 kB
        Rene Nederhand
      3. OpenNLPFilter.java
        8 kB
        Em
      4. OpenNLPTokenizer.java
        6 kB
        Em

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              23 Vote for this issue
              Watchers:
              44 Start watching this issue

              Dates

              • Created:
                Updated:

                Development