Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-185

[PATCH] Thai Analysis Enhancement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • modules/analysis
    • None
    • Operating System: All
      Platform: All

    • 27182

    Description

      Unlike other languages, Thai do not have a clear word boundary within a
      sentence. Words are written consecutively without a delimiter. The Lucene
      StandardTokenizer currently cannot tokenize a Thai sentence and returns the
      whole sentence as a token. A special tokenizer to break Thai sentences into
      words is required.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pichaio@yahoo.com Pichai Ongvasith
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: