Lucene - Core
  1. Lucene - Core
  2. LUCENE-2620

Queries with too many asterisks causing 100% CPU usage

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.1
    • Fix Version/s: 2.9.4, 3.0.3, 3.1
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others

    • Lucene Fields:
      New, Patch Available

      Description

      If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.

      I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

      1. lucene-asterisks.diff
        0.6 kB
        Nick Barkas
      2. LUCENE-2620_3x.patch
        2 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Hello Nick, thanks for your patch.

        In trunk this is no problem, because wildcard query works in a very different way and both foo**********bar and foo*bar are compiled to the same matcher:

            WildcardQuery wq = new WildcardQuery(new Term("foo", "foo*******bar"));
            WildcardQuery wq2 = new WildcardQuery(new Term("foo", "foo*bar"));
            assertEquals(wq.automaton.getNumberOfStates(), wq2.automaton.getNumberOfStates());
            assertEquals(wq.automaton.getNumberOfTransitions(), wq2.automaton.getNumberOfTransitions());
        

        But at a glance, your patch looks like a potentially useful optimization for 3.x

        Show
        Robert Muir added a comment - Hello Nick, thanks for your patch. In trunk this is no problem, because wildcard query works in a very different way and both foo**********bar and foo*bar are compiled to the same matcher: WildcardQuery wq = new WildcardQuery(new Term("foo", "foo*******bar")); WildcardQuery wq2 = new WildcardQuery(new Term("foo", "foo*bar")); assertEquals(wq.automaton.getNumberOfStates(), wq2.automaton.getNumberOfStates()); assertEquals(wq.automaton.getNumberOfTransitions(), wq2.automaton.getNumberOfTransitions()); But at a glance, your patch looks like a potentially useful optimization for 3.x
        Hide
        Robert Muir added a comment -

        I took a look at this, and the worst-case behavior in 3x etc is, in my opinion, definitely bug territory.

        when 3x's wildcardEquals() encounters a '*', it does this:

        for (int i = string.length(); i >= s; --i)
                  {
                    if (wildcardEquals(pattern, p, string, i))
                    {
                      return true;
                    }
                  }
        

        This is itself already inside a loop in wildcardEquals, so its a disaster.

        I added a test for this, and Nick's fix (with one needed length check) and the tests pass.
        but if you run the test without the change, you will see what Nick is experiencing.

        Show
        Robert Muir added a comment - I took a look at this, and the worst-case behavior in 3x etc is, in my opinion, definitely bug territory. when 3x's wildcardEquals() encounters a '*', it does this: for ( int i = string.length(); i >= s; --i) { if (wildcardEquals(pattern, p, string, i)) { return true ; } } This is itself already inside a loop in wildcardEquals, so its a disaster. I added a test for this, and Nick's fix (with one needed length check) and the tests pass. but if you run the test without the change, you will see what Nick is experiencing.
        Hide
        Robert Muir added a comment -

        Assigning 2.9.x and 3.0.x fix versions as, it seems to loop infinitely (or the runtime is so terrible it might as well be infinite).

        Show
        Robert Muir added a comment - Assigning 2.9.x and 3.0.x fix versions as, it seems to loop infinitely (or the runtime is so terrible it might as well be infinite).
        Hide
        Robert Muir added a comment -

        Committed to 3.x (988620) 3.0.x (988638) 2.9.x (988682).

        Thanks Nick!

        Show
        Robert Muir added a comment - Committed to 3.x (988620) 3.0.x (988638) 2.9.x (988682). Thanks Nick!

          People

          • Assignee:
            Robert Muir
            Reporter:
            Nick Barkas
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development