[LUCENE-9043] Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 8.3
Fix Version/s: 5.5.6
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

This component is developed based on three main researches.

Sinhala Analyzer, as it word implies it is an enhanced software library to analyze documents which are written in Sinhala language. Sinhala Analyzer has implemented by performing Sinhala morphological analysis. Tokenizing the document content precisely, Removing stopwords accordingly and converting the terms to its base/root form accurately are the main three functionalities of Sinhala Analyzer.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SinhalaAnalyzer.java
13/Nov/19 06:17
4 kB
pavithra kariyawasam
SinhalaStemmer.java
13/Nov/19 06:17
25 kB
pavithra kariyawasam
SinhalaTokenizer.java
13/Nov/19 06:17
12 kB
pavithra kariyawasam
stopwords.txt
13/Nov/19 06:17
2 kB
pavithra kariyawasam

Issue Links

relates to

LUCENE-9044 Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

Open

Activity

People

Assignee:: Unassigned

Reporter:: pavithra kariyawasam

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 13/Nov/19 06:19

Updated:: 28/Aug/22 15:52