Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9930

UkrainianMorfologikAnalyzer reloads its Dictionary for every new TokenStreamComponents instance

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0
    • None
    • None
    • New

    Description

      Large static data structures should be loaded in Analyzer constructors and shared between threads, but the UkrainianMorfologikAnalyzer is loading its dictionary in `createComponents`, which means it is reloaded and stored on every new analysis thread. If you have a large dictionary and highly concurrent indexing then this can lead to you running out of memory as multiple copies of the dictionary are held in thread locals.

      Attachments

        Issue Links

          Activity

            People

              romseygeek Alan Woodward
              romseygeek Alan Woodward
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m