Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5468

Hunspell very high memory use when loading dictionary

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 4.8, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files.
      For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more.
      By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well).

      Sample error log entries:
      http://pastebin.com/fSrdd5W1
      http://pastebin.com/Lmi0re7Z

        Attachments

        1. LUCENE-5468.patch
          239 kB
          Robert Muir
        2. patch.txt
          15 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              c2h5oh Maciej Lisiewski
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: