[LUCENE-1758] improve arabic analyzer: light8 -> light10 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.9
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

Someone mentioned on the java user list that the arabic analysis was not as good as they would like.

This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
In the light10 paper, this improves precision from .390 to .413
They mention this is not statistically significant, but it makes linguistic sense and at least has been shown not to hurt.

In the future, I hope openrelevance will allow us to try some more approaches.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1758.patch
04/Aug/09 23:57
11 kB
Robert Muir
LUCENE-1758.patch
30/Jul/09 04:58
10 kB
Robert Muir
LUCENE-1758.patch
26/Jul/09 23:18
7 kB
Robert Muir
LUCENE-1758.txt
23/Jul/09 18:00
2 kB
Robert Muir

Activity

People

Assignee:: Robert Muir

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 23/Jul/09 17:59

Updated:: 28/Aug/22 12:04

Resolved:: 05/Aug/09 18:27