Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13336

solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can result in exponential expansion of naive queries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.0, 8.0
    • 8.1, 9.0
    • query parsers
    • None

    Description

      changes made in Solr 7.0 set the effective value of BoleanQuery.getMaxClauseCount to Integer.MAX_VALUE-1 and only impossed a restriction based on the (existing) solrconfig.xml setting at the Solr query parser level via a new utility helper method.l

      But this means programatically generated queries (either by low level lucene methods, or by query re-writing) no longer had any safety valve to prevent (effectively) infinite expansion. This issue fixes this problem by:

      • restoring a default upper bound on BoleanQuery.getMaxClauseCount of 1024
      • introducing a new solr.xml level setting for configuring this upper bound:
        <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
        

      NOTE that this solr.xml limit is ahard upper bound, that superceeds the existing solrconfig.xml setting, which has been left in place and still limits the size of user specified boolean queries. ie: solr.xml maxBooleanClauses >= solrconfig.xml maxBooleanClauses >= number of clauses a user explicitly specifies in a query string; solr.xml maxBooleanClauses >= numberr of clauses in an expanded/rewritten query

      original bug report

      Since SOLR-10921 it appears that Solr always sets BooleanQuery.maxClauseCount (at the Lucene level) to Integer.MAX_VALUE-1. I assume this is because Solr parses maxBooleanClauses out of the config and applies it externally.

      In any case, when used as part of lucene.util.QueryBuilder.analyzeGraphPhrase (and possibly other places?), the Lucene code checks internally against only the static maxClauseCount variable (permanently set to Integer.MAX_VALUE-1 in the context of Solr).

      Thus in at least one case (analyzeGraphPhrase(), but possibly others?), maxBooleanClauses is having no effect. I'm pretty sure this is what's underlying the issue reported here as being related to Solr 7.6.

      To summarize, users are definitely susceptible (to varying degrees of likely severity, assuming no actual malicious attack) if:

      1. Running Solr >= 7.6.0
      2. Using edismax with "ps" param set to >0
      3. Query-time analysis chain is at all capable of producing graphs (e.g., WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms with varying token lengths.

      Users are particularly vulnerable in practice if they have query-time WordDelimiterGraphFilter configured with preserveOriginal=true.

      To clarify, Lucene/Solr 7.6 didn't exactly introduce the issue; it only increased the likelihood of problems manifesting (as a result of LUCENE-8531). Notably, the "enumerated strings" approach to graph phrase query (reintroduced by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely on default Lucene-level maxClauseCount failsafe (removed as of 7.0). This explains the odd "Affects versions" => maxBooleanClauses was disabled at the Lucene level (in Solr contexts) starting with version 7.0, but the change became more likely to manifest problems for users as of 7.6.

      Attachments

        1. SOLR-13336.patch
          26 kB
          Chris M. Hostetter
        2. SOLR-13336.patch
          26 kB
          Chris M. Hostetter
        3. SOLR-13336.patch
          22 kB
          Chris M. Hostetter

        Issue Links

          Activity

            People

              hossman Chris M. Hostetter
              magibney Michael Gibney
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: