Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6672

CompiledAutomaton can generate a binary automaton that have more than 12*maxDeterminizedStates

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 4.10.3, 6.0
    • None
    • core/index
    • None

    Description

      The maxDeterminizedStates parameter to Automaton has introduced a way to prevent massive states explosion during the generation of Automatas. This is a nice feature to protect applications against DoS attacks. Unfortunately in some cases like wildcard queries with a lot of wildcards the resulting binary Automaton can exceed maxDeterminizedStates by a factor of ~12.
      If I configure my application with the default maxDeterminizedStates to 10,000 CompiledAutomaton can potentially generate Automatas with more than 120,000 states.

      This is because UTF32ToUTF8 ignores maxDeterminizedStates and can generate a large binary automata that will be passed to the costly Operations.getCommonSuffixBytesRef.

      Current workaround is to set maxDeterminizedStates to expectedMaxStates/13.

      I'm not sure what's the best way to fix this issue, UTF32ToUTF8.convert() uses the Automaton.Builder which is very fast to create states, adding a check after each state creation is maybe not the best idea.

      A partial quick fix could be to check the size of the resulting binary automata and fail before running the costly Operations.getCommonSuffixBytesRef.

      Another fix would be to generalize maxDeterminizedStates to maxStates at the Automaton.Builder level. The maxStates could be checked before costly operations (before ArrayUtil.grow in addTransition and in finishState). Unfortunately this one requires more refactoring (not included in the patch).

      I included two patches to illustrate the above two fixes.

      Attachments

        1. maxStates-overview.patch
          3 kB
          David Causse
        2. quick-fix.patch
          0.7 kB
          David Causse

        Activity

          People

            Unassigned Unassigned
            dcausse David Causse
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: