Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-875

Add support for Paoding (Chinese)

    XMLWordPrintableJSON

Details

    Description

      [paoding-analysis](http://code.google.com/p/paoding/) is a Solr/Lucene tokenizer for Chinese. It represents an alternative to the Smartcn analyzer included in the Solr Analysis Extras.

      To allow he use of Paoding for processing Chinese Text one needs the following things:

      (1) Extension to the Stanbol Commons Solr module: Basically an OSGI bundle version of the paoding analyzers. As paoding depends heavily on System environments variables and file system paths some adaptions to the initialization are necessary.

      (2) LabelTokenizer implementation for the EntityLinkingEngine: This is needed to Tokenizer Labels of Entities

      (3) Tokenizer Enhancement Engine: This will retrieve/create the AnalyzedText contentpart, tokenize the text parsed to the Stanbol Enhancer and add Tokens for the detected words to the AnalyzedText.

      Attachments

        Issue Links

          Activity

            People

              rwesten Rupert Westenthaler
              rwesten Rupert Westenthaler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: