Details
Description
Added a new Arabic snowball stemmer based on https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl
As well an Arabic test dataset in `TestSnowballVocabData.zip` from the snowball-data generated from the input file available here https://github.com/snowballstem/snowball-data/tree/master/arabic
https://github.com/ibnmalik/golden-corpus-arabic/blob/develop/core/words.txt
It also updates the ant patch-snowball target to be compatible with
the java classes generated by the last snowball version (tree:
1964ce688cbeca505263c8f77e16ed923296ce7a). The ant patch-snowball target
is retro-compatible with the version of snowball stemmers used in
lucene 7.x and ignores already patched classes.
Link to the corresponding Github PR:
https://github.com/apache/lucene-solr/pull/449
Edited: updated the corpus link, PR link and description
Attachments
Issue Links
- duplicates
-
LUCENE-8336 Refresh Snowball stemming module to add Arabic stemmer
- Resolved
- links to