Lucene - Core
  1. Lucene - Core
  2. LUCENE-4019

Parsing Hunspell affix rules without regexp condition

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.6
    • Fix Version/s: 4.0-ALPHA, 5.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following:

       
      SFX Na N 1
      SFX Na 0 ste
      

      The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'. On the other hand I haven't found any information about this within the spec. Any thoughts?

      1. LUCENE-4019.patch
        3 kB
        Luca Cavanna
      2. LUCENE-4019.patch
        9 kB
        Luca Cavanna
      3. LUCENE-4019.patch
        10 kB
        Luca Cavanna

        Activity

          People

          • Assignee:
            Chris Male
            Reporter:
            Luca Cavanna
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development