Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This issue covers the NLP processing components as discussed in http://markmail.org/message/qxusiup3mim2lhpx
Goals
=====
1. provide a modular infrastructure for NLP-related things
Many tasks in NLP can be computationally intensive, and there is no "one fits
all" NLP approach when analysing text. Therefore, we wanted to have a NLP
infrastructure that can be configured and wired together as needed for the
specific use case, with several specialised modules that can build upon each
other but many of which are optional.
2. provide a unified data model for representing NLP text annotations
In many szenarios, it will be necessary to implement custom engines building on
the results of a previous "generic" analysis of the text (e.g. POS tagging and
chunking). For example, in a project we are identifying so-called "noun
phrases", use a lemmatizer to build the ground form, then convert this to
singular nominative form to have a gramatically correct label to use in a tag
cloud. Most of this builds on generic NLP functionality, but the last step is
very specific to the use case.
Therefore, we wanted also to implement a generic NLP data model that allows
representing text annotations attached to individual words or also to spans of
words.
Attachments
Attachments
Issue Links
- relates to
-
STANBOL-738 CELI Lemmatizer Engine
- Resolved
-
STANBOL-760 Sentiment Summarization EnhancementEngine
- Resolved
-
STANBOL-741 NLP 2 RDF Enhancement Engine
- Closed