Solr
  1. Solr
  2. SOLR-737

Incorrect 500 error reported with maxClausCount limit

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      Here is my installation:
      Solr Specification Version: 1.2.2008.08.13.13.05.16
      Lucene Implementation Version: 2.4-dev 685576 - 2008-08-13 10:55:25

      I did the following query today:
      author:(r*a* AND fisher)

      And get the following 500 error:

      maxClauseCount is set to 1024

      org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
      at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
      at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
      at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
      at org.apache.lucene.search.WildcardQuery.rewrite(WildcardQuery.java:54)
      at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
      at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:163)
      at org.apache.lucene.search.Query.weight(Query.java:94)
      at org.apache.lucene.search.Searcher.createWeight(Searcher.java:175)
      at org.apache.lucene.search.Searcher.search(Searcher.java:126)
      at org.apache.lucene.search.Searcher.search(Searcher.java:105)
      at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
      at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
      at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
      at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
      at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167)
      at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
      at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)
      at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
      at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
      at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
      at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
      at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
      at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
      at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
      at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
      at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
      at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
      at org.mortbay.jetty.Server.handle(Server.java:324)
      at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
      at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
      at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
      at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
      at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
      at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
      at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)

      1. SOLR-737.patch
        5 kB
        Yonik Seeley

        Activity

        Hide
        Andrew Nagy added a comment -

        Why has this been relegated to an improvement and held to 1.4?

        This is a major showstopper bug for me - unless I am understanding something incorrectly?

        Show
        Andrew Nagy added a comment - Why has this been relegated to an improvement and held to 1.4? This is a major showstopper bug for me - unless I am understanding something incorrectly?
        Hide
        Mark Miller added a comment -

        Because its an artificial limitation from lucene - truncation queries expand to one clause per possible term in the index - generate enough of these clauses and you have a really slow search. Lucene bails at the default of 1024. Not sure if this setting is available in solr, but as Otis marked as improvement, I would guess not and the idea is to add it. Its not a bug though - your wildcard term is just matching over 1024 terms in the index.

        Show
        Mark Miller added a comment - Because its an artificial limitation from lucene - truncation queries expand to one clause per possible term in the index - generate enough of these clauses and you have a really slow search. Lucene bails at the default of 1024. Not sure if this setting is available in solr, but as Otis marked as improvement, I would guess not and the idea is to add it. Its not a bug though - your wildcard term is just matching over 1024 terms in the index.
        Hide
        Andrew Nagy added a comment -

        Thanks Mark for clarification. This makes sense now. Solr does already have a configurable maxClauseCount and the default is 1024.

        Can anyone supply more information on whether or not this is something that can be enhanced in Lucene - for me this is a very important query and since I have well over a million documents - I will never be able to issue this query.

        Show
        Andrew Nagy added a comment - Thanks Mark for clarification. This makes sense now. Solr does already have a configurable maxClauseCount and the default is 1024. Can anyone supply more information on whether or not this is something that can be enhanced in Lucene - for me this is a very important query and since I have well over a million documents - I will never be able to issue this query.
        Hide
        Erik Hatcher added a comment -

        Andrew - for one, you can increase the boolean clause limit (at the risk of a less performant query). In solrconfig, adjust this: <maxBooleanClauses>1024</maxBooleanClauses>

        Also, there are many tricks that can be played to make wildcard querying more efficient if you are willing to sacrifice index size and manage index analyzer and query analyzer carefully. Have a look this topic in the java-user@lucene archives. I did a lot of work once upon a time for a client that involved term rotation during indexing and then morphing wildcard queries to have maximal prefix for best efficiency.

        As a thought experiment - consider what you'd do if you had to satisfy a patrons request for "find me all books matching r*a* in the title" using a card catalog system!

        Show
        Erik Hatcher added a comment - Andrew - for one, you can increase the boolean clause limit (at the risk of a less performant query). In solrconfig, adjust this: <maxBooleanClauses>1024</maxBooleanClauses> Also, there are many tricks that can be played to make wildcard querying more efficient if you are willing to sacrifice index size and manage index analyzer and query analyzer carefully. Have a look this topic in the java-user@lucene archives. I did a lot of work once upon a time for a client that involved term rotation during indexing and then morphing wildcard queries to have maximal prefix for best efficiency. As a thought experiment - consider what you'd do if you had to satisfy a patrons request for "find me all books matching r*a* in the title" using a card catalog system!
        Hide
        Andrew Nagy added a comment -

        Sorry to keep blathering on - but I am trying to understand this issue better.

        If I issue the query (r* AND fisher) the results come back to me immediately ... no slow down what so ever. And an r* is going to have many many more possibilities than r*a* - it still seems like there is a bug here.

        Can anyone clarify how lucene handles this?

        Show
        Andrew Nagy added a comment - Sorry to keep blathering on - but I am trying to understand this issue better. If I issue the query (r* AND fisher) the results come back to me immediately ... no slow down what so ever. And an r* is going to have many many more possibilities than r*a* - it still seems like there is a bug here. Can anyone clarify how lucene handles this?
        Hide
        Yonik Seeley added a comment - - edited

        r* is a prefix query that Solr turns into a ConstantScorePrefixQuery
        r*a* is a wildcard query.... it should eventually get the same treatment, but we don't currently have a ConstantScoreWildcardQuery.

        Show
        Yonik Seeley added a comment - - edited r* is a prefix query that Solr turns into a ConstantScorePrefixQuery r*a* is a wildcard query.... it should eventually get the same treatment, but we don't currently have a ConstantScoreWildcardQuery.
        Hide
        Yonik Seeley added a comment -

        Here's a quick patch to fix things.

        Show
        Yonik Seeley added a comment - Here's a quick patch to fix things.
        Hide
        Yonik Seeley added a comment -

        Thinking on this a little further, I do think this is a bug, and I do think it warrants going into 1.3

        The original range and prefix queries were broken, and I fixed them via ConstantScoreQuery. I never did it for wildcard query because the company I worked for at the time didn't use them. But any query that explodes when you change the index is arguably broken.

        Objections to this going into 1.3?

        Show
        Yonik Seeley added a comment - Thinking on this a little further, I do think this is a bug, and I do think it warrants going into 1.3 The original range and prefix queries were broken, and I fixed them via ConstantScoreQuery. I never did it for wildcard query because the company I worked for at the time didn't use them. But any query that explodes when you change the index is arguably broken. Objections to this going into 1.3?
        Hide
        Shalin Shekhar Mangar added a comment -

        +1 for marking for 1.3

        Does it also make execution of these queries any faster? Sorry, I'm not very familiar with ConstantScoreQuery and related Lucene classes.

        Show
        Shalin Shekhar Mangar added a comment - +1 for marking for 1.3 Does it also make execution of these queries any faster? Sorry, I'm not very familiar with ConstantScoreQuery and related Lucene classes.
        Hide
        Yonik Seeley added a comment -

        Does it also make execution of these queries any faster?

        On balance, I think so. If only a few terms would be matched, it will be a little slower. If a lot of terms are matched, then it will normally be faster.

        Show
        Yonik Seeley added a comment - Does it also make execution of these queries any faster? On balance, I think so. If only a few terms would be matched, it will be a little slower. If a lot of terms are matched, then it will normally be faster.
        Hide
        Yonik Seeley added a comment -

        committed.
        I'm currently figuring out how to do a merge to commit it to the 1.3 branch also.

        Show
        Yonik Seeley added a comment - committed. I'm currently figuring out how to do a merge to commit it to the 1.3 branch also.

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Nagy
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development