Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1758

improve arabic analyzer: light8 -> light10

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Someone mentioned on the java user list that the arabic analysis was not as good as they would like.

      This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
      In the light10 paper, this improves precision from .390 to .413
      They mention this is not statistically significant, but it makes linguistic sense and at least has been shown not to hurt.

      In the future, I hope openrelevance will allow us to try some more approaches.

        Attachments

        1. LUCENE-1758.patch
          11 kB
          Robert Muir
        2. LUCENE-1758.patch
          10 kB
          Robert Muir
        3. LUCENE-1758.patch
          7 kB
          Robert Muir
        4. LUCENE-1758.txt
          2 kB
          Robert Muir

          Activity

            People

            • Assignee:
              rcmuir Robert Muir
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: