[STANBOL-733] Stanbol NLP processing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: enhancer-0.10.0
Component/s: Enhancer
Labels:
None

Description

This issue covers the NLP processing components as discussed in http://markmail.org/message/qxusiup3mim2lhpx

Goals
=====

1. provide a modular infrastructure for NLP-related things

Many tasks in NLP can be computationally intensive, and there is no "one fits
all" NLP approach when analysing text. Therefore, we wanted to have a NLP
infrastructure that can be configured and wired together as needed for the
specific use case, with several specialised modules that can build upon each
other but many of which are optional.

2. provide a unified data model for representing NLP text annotations

In many szenarios, it will be necessary to implement custom engines building on
the results of a previous "generic" analysis of the text (e.g. POS tagging and
chunking). For example, in a project we are identifying so-called "noun
phrases", use a lemmatizer to build the ground form, then convert this to
singular nominative form to have a gramatically correct label to use in a tag
cloud. Most of this builds on generic NLP functionality, but the last step is
very specific to the use case.

Therefore, we wanted also to implement a generic NLP data model that allows
representing text annotations attached to individual words or also to spans of
words.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

srfgkmt-stanbol-nlp.zip
17/Sep/12 14:37
139 kB
Sebastian Schaffert

Issue Links

relates to

STANBOL-738 CELI Lemmatizer Engine

Resolved

STANBOL-760 Sentiment Summarization EnhancementEngine

Resolved

STANBOL-741 NLP 2 RDF Enhancement Engine

Closed

Sub-Tasks

1.	ContentPart for NLP data - AnalyzedText	Closed	Rupert Westenthaler
2.	OpenNLP POS Tagger Engine	Closed	Unassigned
3.	OpenNLP Chunker Engine	Closed	Unassigned
4.	Sentiment Tagger Engine	Closed	Rupert Westenthaler
5.	Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart	Closed	Rupert Westenthaler
6.	Adopt the KeywordLinkingEngine to use the AnalyzedText content part	Closed	Rupert Westenthaler
7.	OpenNLP Tokenizer Engine	Closed	Rupert Westenthaler
8.	OpenNLP Sentence Detection Engine	Closed	Rupert Westenthaler
9.	Adapt the OpenNLP NER engine to support the AnalyzedText ContentPart	Closed	Rupert Westenthaler
10.	Rename the AnalyzedText based KeywordLinkingEngine to EntityhubLinkingEngine	Closed	Rupert Westenthaler
11.	Add Integration-Tests for the Stanbol NLP processing module	Resolved	Rupert Westenthaler
12.	Refactor EntityLinkingEngine so that it does no longer depend on the Stanbol Entityhub Component	Closed	Rupert Westenthaler

Activity

People

Assignee:: Unassigned

Reporter:: Rupert Westenthaler

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Sep/12 09:36

Updated:: 17/Jul/13 15:16

Resolved:: 14/Dec/12 07:14