Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6046

RegExp.toAutomaton high memory use

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 4.10.1
    • 4.10.3, 5.0, 6.0
    • core/queryparser
    • None
    • New

    Description

      When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java.

      The following caused an OutOfMemoryError with a 32gb heap:

      new RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton();
      

      When increased to a 60gb heap, the following exception is thrown:

        1> java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623)
        1>     __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
        1>     org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
        1>     org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
        1>     org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
        1>     org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
        1>     org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
        1>     org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
        1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
        1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
      

      Attachments

        1. LUCENE-6046.patch
          90 kB
          Nik Everett
        2. LUCENE-6046.patch
          63 kB
          Michael McCandless
        3. LUCENE-6046.patch
          71 kB
          Nik Everett

        Activity

          People

            mikemccand Michael McCandless
            dakrone Lee Hinman
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: