Uploaded image for project: 'Stanbol'
  1. Stanbol
  2. STANBOL-538

Improve extraction of Keywords (alpha numeric IDs, URNs ...) with the KeywordLinkingEngine

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      Currently the KeywordEngine can not be used to match against alpha numeric IDs as often used for products. This is because the Tokenizers used by OpenNLP tend to split such IDs in several small tokens what prevents a correct mapping against such kind of IDs.

      The simplest solution is to implement a simple Tokenizer that is optimized for the use to extract Keywords. Such an Tokenizer should only split based on white spaces.

        Attachments

          Activity

            People

            • Assignee:
              rwesten Rupert Westenthaler
              Reporter:
              rwesten Rupert Westenthaler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: