Lucene - Core
  1. Lucene - Core
  2. LUCENE-3510

BooleanScorer should not limit number of prohibited clauses

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today it's limited to 32, because it uses a separate bit in the mask
      for each clause.

      But I don't understand why it does this; I think all prohibited
      clauses can share a single boolean/bit? Any match on a prohibited
      clause sets this bit and the doc is not collected; we don't need each
      prohibited clause to have a dedicated bit?

      We also use the mask for required clauses, but this code is now
      commented out (we always use BS2 if there are any required clauses);
      if we re-enable this code (and I think we should, at least in certain
      cases: I suspect it'd be faster than BS2 in many cases), I think we
      can cutover to an int count instead of bit masks, and then have no
      limit on the required clauses sent to BooleanScorer also.

      Separately I cleaned a few things up about BooleanScorer: all of the
      embedded scorer methods (nextDoc, docID, advance, score) now throw
      UOE; pre-allocate the buckets instead of doing it lazily
      per-sub-collect.

      1. LUCENE-3510.patch
        9 kB
        Michael McCandless

        Activity

        Hide
        Uwe Schindler added a comment -

        Bulk close after release of 3.5

        Show
        Uwe Schindler added a comment - Bulk close after release of 3.5
        Hide
        Michael McCandless added a comment -

        Thanks Doug, that makes sense! So it looks like this is a safe change.

        Show
        Michael McCandless added a comment - Thanks Doug, that makes sense! So it looks like this is a safe change.
        Hide
        Doug Cutting added a comment -

        Using a single bit to track prohibited terms seems reasonable, plus a count for required terms.

        I don't recall the exact history of the original implementation. I think it may have been in order to support more complex boolean expressions. Any boolean expression can be rewritten to disjunctive normal form, which can then be evaluated with a set of required/prohibited mask pairs, one per conjunctive clause. This is something I'd implemented previously and probably had in mind when implementing BooleanScorer. A Lucene boolean query is effectively a single such conjunctive clause, since the optional terms can be ignored when evaluating the boolean expression, so would reduce to a single pair of masks. But, as you observe, this single clause DNF case can be further simplified to a boolean and a count of required terms. Does that make sense?

        Show
        Doug Cutting added a comment - Using a single bit to track prohibited terms seems reasonable, plus a count for required terms. I don't recall the exact history of the original implementation. I think it may have been in order to support more complex boolean expressions. Any boolean expression can be rewritten to disjunctive normal form, which can then be evaluated with a set of required/prohibited mask pairs, one per conjunctive clause. This is something I'd implemented previously and probably had in mind when implementing BooleanScorer. A Lucene boolean query is effectively a single such conjunctive clause, since the optional terms can be ignored when evaluating the boolean expression, so would reduce to a single pair of masks. But, as you observe, this single clause DNF case can be further simplified to a boolean and a count of required terms. Does that make sense?
        Hide
        Michael McCandless added a comment -

        I feel bad for Tate:

        http://www.gossamer-threads.com/lists/lucene/java-user/8403

        But at least finally we are addressing this.

        Show
        Michael McCandless added a comment - I feel bad for Tate: http://www.gossamer-threads.com/lists/lucene/java-user/8403 But at least finally we are addressing this.
        Hide
        Michael McCandless added a comment -

        Patch.

        Show
        Michael McCandless added a comment - Patch.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development