Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-1251

Pos tag based Phrase extraction Engine

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.12.0
    • Enhancement Engines
    • None

    Description

      Implement an Enhancement Engine that uses POS tags to extract Noun and Verb Phrases

      In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see documentation at [1] for detailed information). This alignment allows engines to language independent determine the lexical categories of tokens in the text.

      The Pos-Chunker Engine will use those lexical categories of tokens to extract Noun and Verb phrases by using the following rules

          1. Noun Phrases
      • start: noun, pronoun, determiners, adjectives
      • continuation: nouns, adpositions, adjectives, punctations
      • end: noun, pronoun, determiners, adjectives
      • required: noun
          1. Verb Phrases
      • start: verb, adverb
      • continuation: verb, adverb, punctations
      • end: verb, adverb
      • required: verb

      This engine will allow to configure the processed languages (e.g. to deactivate it for languages where other chunker are available).

      The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK

      The current plan is to make this engine also available in the 0.12 branch

      [1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: