Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9043

Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 8.3
    • Fix Version/s: 5.5.6
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This component is developed based on three main researches.

      Sinhala Analyzer, as it word implies it is an enhanced software library to analyze documents which are written in Sinhala language. Sinhala Analyzer has implemented by performing Sinhala morphological analysis. Tokenizing the document content precisely, Removing stopwords accordingly and converting the terms to its base/root form accurately are the main three functionalities of Sinhala Analyzer.

        Attachments

        1. SinhalaAnalyzer.java
          4 kB
          pavithra kariyawasam
        2. SinhalaTokenizer.java
          12 kB
          pavithra kariyawasam
        3. stopwords.txt
          2 kB
          pavithra kariyawasam
        4. SinhalaStemmer.java
          25 kB
          pavithra kariyawasam

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pavithraK pavithra kariyawasam
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated: