Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-503

Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None

      Description

      Thai text don't have space between words. Usually, a dictionary-based algorithm is used to break string into words. For Lucene to be usable for Thai, an Analyzer that know how to break Thai words is needed.

      I've implemented such Analyzer, ThaiAnalyzer, using ICU4j DictionaryBasedBreakIterator for word breaking. I'll upload the code later.

      I'm normally a C++ programmer and very new to Java. Please review the code for any problem. One possible problem is that it requires ICU4j. I don't know whether this is OK.

        Attachments

        1. TestThaiAnalyzer.java
          2 kB
          Samphan Raruenrom
        2. ThaiWordFilter.java
          2 kB
          Samphan Raruenrom
        3. ThaiAnalyzer.java
          1 kB
          Samphan Raruenrom

          Activity

            People

            • Assignee:
              hossman Hoss Man
              Reporter:
              samphan Samphan Raruenrom
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: