[LUCENE-2899] Add OpenNLP Analysis capabilities as a module - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.3, 8.0
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does:

Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens)
NamedEntity recognition as a TokenFilter

We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position.

I'd propose it go under:
modules/analysis/opennlp

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2899.patch
12/Dec/17 05:00
588 kB
Steven Rowe
LUCENE-2899.patch
27/Jul/16 22:08
261 kB
Steven Rowe
LUCENE-2899.patch
27/Jul/16 21:41
261 kB
Steven Rowe
LUCENE-2899.patch
10/Jul/16 11:35
263 kB
Steven Rowe
LUCENE-2899.patch
09/Jul/16 23:46
252 kB
Steven Rowe
LUCENE-2899.patch
04/Nov/13 02:33
247 kB
Lance Norskog
LUCENE-2899-6.1.0.patch
11/Jul/16 22:43
262 kB
Steven Rowe
LUCENE-2899-RJN.patch
27/Feb/13 22:24
317 kB
Rene Nederhand
OpenNLPFilter.java
30/Sep/12 17:20
8 kB
Em
OpenNLPTokenizer.java
30/Sep/12 17:20
6 kB
Em

Issue Links

depends upon

SOLR-3623 inconsistent treatment of lucene jars & third-party deps in analysis-extras & uima (in war and in lucene-libs)

Closed

is related to

SOLR-4793 Solr Cloud can't upload large config files ( > 1MB) to Zookeeper

Closed

mentioned in: Page Loading...

Activity

People

Assignee:: Steven Rowe

Reporter:: Grant Ingersoll

Votes:: 36 Vote for this issue

Watchers:: 62 Start watching this issue

Dates

Created:: 30/Jan/11 15:44

Updated:: 27/Sep/22 09:15

Resolved:: 19/Dec/17 00:13