Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4019

Parsing Hunspell affix rules without regexp condition

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.6
    • 4.0-ALPHA, 6.0
    • modules/analysis
    • None
    • New

    Description

      We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following:

       
      SFX Na N 1
      SFX Na 0 ste
      

      The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'. On the other hand I haven't found any information about this within the spec. Any thoughts?

      Attachments

        1. LUCENE-4019.patch
          10 kB
          Luca Cavanna
        2. LUCENE-4019.patch
          9 kB
          Luca Cavanna
        3. LUCENE-4019.patch
          3 kB
          Luca Cavanna

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cmale Chris Male
            lucacavanna Luca Cavanna
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment