Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-141

Tokenizers alpha numeric optimization only recognizes a-z as alpha chars

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • tools-1.5.0-sourceforge
    • 2.2.0
    • Tokenizer
    • None

    Description

      The Tokenizer has an optimization which skips tokens which are only made of numerics or alpha chars. In foreign languages the alpha chars contain umlauts and other letters which are not included in the a-z range.

      Attachments

        Issue Links

          Activity

            People

              mawiesne Martin Wiesner
              joern Jörn Kottmann
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: