Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8462

New Arabic snowball stemmer

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 7.6, 8.0
    • None
    • New

    Description

      Added a new Arabic snowball stemmer based on https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl

      As well an Arabic test dataset in `TestSnowballVocabData.zip` from the snowball-data generated from the input file available here https://github.com/snowballstem/snowball-data/tree/master/arabic

      https://github.com/ibnmalik/golden-corpus-arabic/blob/develop/core/words.txt

       

      It also updates the ant patch-snowball target to be compatible with
      the java classes generated by the last snowball version (tree:
      1964ce688cbeca505263c8f77e16ed923296ce7a). The ant patch-snowball target
      is retro-compatible with the version of snowball stemmers used in
      lucene 7.x and ignores already patched classes.

       

      Link to the corresponding Github PR:
      https://github.com/apache/lucene-solr/pull/449

       Edited: updated the corpus link, PR link and description

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ryadh Ryadh Dahimene
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m

                Slack

                  Issue deployment