Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8462

New Arabic snowball stemmer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 7.6, 8.0
    • None
    • New

    Description

      Added a new Arabic snowball stemmer based on https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl

      As well an Arabic test dataset in `TestSnowballVocabData.zip` from the snowball-data generated from the input file available here https://github.com/snowballstem/snowball-data/tree/master/arabic

      https://github.com/ibnmalik/golden-corpus-arabic/blob/develop/core/words.txt

       

      It also updates the ant patch-snowball target to be compatible with
      the java classes generated by the last snowball version (tree:
      1964ce688cbeca505263c8f77e16ed923296ce7a). The ant patch-snowball target
      is retro-compatible with the version of snowball stemmers used in
      lucene 7.x and ignores already patched classes.

       

      Link to the corresponding Github PR:
      https://github.com/apache/lucene-solr/pull/449

       Edited: updated the corpus link, PR link and description

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ryadh Ryadh Dahimene
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m