Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7914

Add safeguards to RegExp.toAutomaton and Operations

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.0, 8.0
    • None
    • None
    • New

    Description

      When creating an automaton from a regexp, some operators can create more states than maxDeterminizedStates:

      a{10000}
      

      The example above <b>creates</b> a single path with 10k states already determinized so maxDeterminizedStates is not checked.
      Some operations on automaton like Operations.isFinite or Operations.topoSortStates are recursive and the maximum level of recursion depends on the longest path in the automaton. So a large automaton like above can exceed java's stack.
      In most of the cases we are covered by maxDeterminizedStates but there will always be adversarial cases where a large automaton is created from a small input so I think we should also have safeguards in the recursive methods.

      I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger than 1000 would throw an IllegalStateException.
      The patch also uses maxDeterminizedStates to limit the number of states that a repeat operator can create and throw a TooComplex..Exception when this limit is reached.
      Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and uses this as an optimization for PrefixQuery that uses infinite automatons only.

      Attachments

        1. LUCENE-7914.patch
          14 kB
          Jim Ferenczi
        2. LUCENE-7914.patch
          14 kB
          Jim Ferenczi
        3. LUCENE-7914.patch
          12 kB
          Jim Ferenczi
        4. LUCENE-7914.patch
          11 kB
          Jim Ferenczi
        5. LUCENE-7914.patch
          11 kB
          Jim Ferenczi

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jim.ferenczi Jim Ferenczi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: