Lucene - Core
  1. Lucene - Core
  2. LUCENE-5480

Hunspell shouldnt merge dictionary entries

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, Trunk
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Ive been writing lots of little unit tests for this thing, and I'm pretty positive i screwed this up in LUCENE-5468... sorry

      Otherwise the whole "prefix-suffix dependencies" described in the manpage won't work.

      Either 'words' should be changed from FST<Long> to FST<IntsRef>, or when there are duplicates we should add 'padding' that we just consume (suggester-style). The latter is a little tricky, but I think this is generally uncommon so it would keep the FST smaller.

      shouldnt be hard to fix.

        Activity

        Hide
        ASF subversion and git services added a comment -

        Commit 1572841 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1572841 ]

        LUCENE-5480: add the tests i have so far... (not including this bug yet though)

        Show
        ASF subversion and git services added a comment - Commit 1572841 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1572841 ] LUCENE-5480 : add the tests i have so far... (not including this bug yet though)
        Hide
        ASF subversion and git services added a comment -

        Commit 1572842 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1572842 ]

        LUCENE-5480: add the tests i have so far... (not including this bug yet though)

        Show
        ASF subversion and git services added a comment - Commit 1572842 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1572842 ] LUCENE-5480 : add the tests i have so far... (not including this bug yet though)
        Hide
        Robert Muir added a comment -

        here is my current state. i've unraveled a few bugs with these cool little tests (the examples from the man page). I'll see how far I can get but i wanted to snapshot here since its some progress...

        Show
        Robert Muir added a comment - here is my current state. i've unraveled a few bugs with these cool little tests (the examples from the man page). I'll see how far I can get but i wanted to snapshot here since its some progress...
        Hide
        Robert Muir added a comment -

        I think the current bug is a longstanding one, because prefix and suffix stripping is not intertwined (so continuation classes from prefixes dont apply to suffixes and so on).

        This causes overstemming today.

        I'd like to fix the current bug(s) here with the uploaded patch and open a followup issue for that... its progress.

        Show
        Robert Muir added a comment - I think the current bug is a longstanding one, because prefix and suffix stripping is not intertwined (so continuation classes from prefixes dont apply to suffixes and so on). This causes overstemming today. I'd like to fix the current bug(s) here with the uploaded patch and open a followup issue for that... its progress.
        Hide
        ASF subversion and git services added a comment -

        Commit 1573048 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1573048 ]

        LUCENE-5480: Hunspell shouldn't merge dictionary entries

        Show
        ASF subversion and git services added a comment - Commit 1573048 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1573048 ] LUCENE-5480 : Hunspell shouldn't merge dictionary entries
        Hide
        ASF subversion and git services added a comment -

        Commit 1573057 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1573057 ]

        LUCENE-5480: Hunspell shouldn't merge dictionary entries

        Show
        ASF subversion and git services added a comment - Commit 1573057 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1573057 ] LUCENE-5480 : Hunspell shouldn't merge dictionary entries
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development