Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9044

Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 5.5.6
    • Component/s: modules/analysis
    • Labels:
    • Environment:

      Lucene

    • Lucene Fields:
      New

      Description

      This component is developed based on three main researches.
      Lucene did not have component to analyze Sinhala documents. So our intension is to fill that space with an Analyzer which can analyze Sinhala documents. Sinhala Analyzer has implemented by performing Sinhala morphological analysis. Tokenizing the document content precisely, Removing stopwords accordingly and converting the terms to its base/root form accurately are the main three functionalities of Sinhala Analyzer. These are built by considering the grammatical rules in Sinhala

        Attachments

        1. SinhalaAnalyzer.java
          4 kB
          pavithra kariyawasam
        2. SinhalaStemmer.java
          25 kB
          pavithra kariyawasam
        3. SinhalaTokenizer.java
          12 kB
          pavithra kariyawasam
        4. stopwords.txt
          2 kB
          pavithra kariyawasam

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pavithraK pavithra kariyawasam

              Dates

              • Created:
                Updated:

                Issue deployment