Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-733

Stanbol NLP processing

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • enhancer-0.10.0
    • Enhancer
    • None

    Description

      This issue covers the NLP processing components as discussed in http://markmail.org/message/qxusiup3mim2lhpx

      Goals
      =====

      1. provide a modular infrastructure for NLP-related things

      Many tasks in NLP can be computationally intensive, and there is no "one fits
      all" NLP approach when analysing text. Therefore, we wanted to have a NLP
      infrastructure that can be configured and wired together as needed for the
      specific use case, with several specialised modules that can build upon each
      other but many of which are optional.

      2. provide a unified data model for representing NLP text annotations

      In many szenarios, it will be necessary to implement custom engines building on
      the results of a previous "generic" analysis of the text (e.g. POS tagging and
      chunking). For example, in a project we are identifying so-called "noun
      phrases", use a lemmatizer to build the ground form, then convert this to
      singular nominative form to have a gramatically correct label to use in a tag
      cloud. Most of this builds on generic NLP functionality, but the last step is
      very specific to the use case.

      Therefore, we wanted also to implement a generic NLP data model that allows
      representing text annotations attached to individual words or also to spans of
      words.

      Attachments

        1. srfgkmt-stanbol-nlp.zip
          139 kB
          Sebastian Schaffert

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rwesten Rupert Westenthaler
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: