Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4019

Parsing Hunspell affix rules without regexp condition

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.6
    • Fix Version/s: 4.0-ALPHA, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following:

       
      SFX Na N 1
      SFX Na 0 ste
      

      The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'. On the other hand I haven't found any information about this within the spec. Any thoughts?

        Attachments

        1. LUCENE-4019.patch
          3 kB
          Luca Cavanna
        2. LUCENE-4019.patch
          9 kB
          Luca Cavanna
        3. LUCENE-4019.patch
          10 kB
          Luca Cavanna

          Activity

            People

            • Assignee:
              cmale Chris Male
              Reporter:
              lucacavanna Luca Cavanna
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: