Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      previously this was 3 hashes (prefixes, words, suffixes) and it tried to split the words in various ways and do lookups. This was changed to FST, but the algorithm wasn't adjusted to use it properly (e.g. single pass, terminate when it reaches a "dead end").

      this makes for slower indexing when using this stemmer...

        Activity

        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1587163 from rmuir@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1587163 ]

        LUCENE-5603: fix hunspell to use FST efficiently

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1587163 from rmuir@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1587163 ] LUCENE-5603 : fix hunspell to use FST efficiently
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1587162 from rmuir@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1587162 ]

        LUCENE-5603: fix hunspell to use FST efficiently

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1587162 from rmuir@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1587162 ] LUCENE-5603 : fix hunspell to use FST efficiently
        Hide
        mikemccand Michael McCandless added a comment -

        +1, looks good!

        Show
        mikemccand Michael McCandless added a comment - +1, looks good!
        Hide
        rcmuir Robert Muir added a comment -

        Here's a patch.

        Reusing my previous benchmark (with polish, see last comment SOLR-3245), indexing speed increases from 2400 docs/second to 2900 docs/second. So its not much of a relative increase in speed (due to some properties of this dictionary), but still I think its worth it. And of course its much better compared to 71 docs/second in Lucene 4.7...

        Show
        rcmuir Robert Muir added a comment - Here's a patch. Reusing my previous benchmark (with polish, see last comment SOLR-3245 ), indexing speed increases from 2400 docs/second to 2900 docs/second. So its not much of a relative increase in speed (due to some properties of this dictionary), but still I think its worth it. And of course its much better compared to 71 docs/second in Lucene 4.7...

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development