Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-395

CoordConstrainedBooleanQuery + QueryParser support

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

    • Bugzilla Id:
      35284

      Description

      Attached 2 new classes:

      1) CoordConstrainedBooleanQuery
      A boolean query that only matches if a specified number of the contained clauses
      match. An example use might be a query that returns a list of books where ANY 2
      people from a list of people were co-authors, eg:
      "Lucene In Action" would match ("Erik Hatcher" "Otis Gospodnetić" "Mark Harwood"
      "Doug Cutting") with a minRequiredOverlap of 2 because Otis and Erik wrote that.
      The book "Java Development with Ant" would not match because only 1 element in
      the list (Erik) was selected.

      2) CustomQueryParserExample
      A customised QueryParser that allows definition of
      CoordConstrainedBooleanQueries. The solution (mis)uses fieldnames to pass
      parameters to the custom query.

      1. ASF.LICENSE.NOT.GRANTED--CoordConstrainedBooleanQuery.java
        5 kB
        Mark Harwood
      2. ASF.LICENSE.NOT.GRANTED--CoordConstrainedBooleanQuery.java
        2 kB
        Mark Harwood
      3. ASF.LICENSE.NOT.GRANTED--CustomQueryParserExample.java
        4 kB
        Mark Harwood
      4. ASF.LICENSE.NOT.GRANTED--CustomQueryParserExample.java
        3 kB
        Mark Harwood
      5. BooleanQuery.patch
        3 kB
        Yonik Seeley
      6. BooleanQuery.patch
        3 kB
        Yonik Seeley
      7. BooleanScorer2.java
        13 kB
        Paul Elschot
      8. LUCENE-395.patch
        6 kB
        Hoss Man
      9. LUCENE-395.patch
        5 kB
        Hoss Man
      10. LUCENE-395.patch
        5 kB
        Hoss Man
      11. TestBoolean2Patch5.txt
        1 kB
        Paul Elschot
      12. TestBooleanMinShouldMatch.java
        14 kB
        Yonik Seeley
      13. TestBooleanMinShouldMatch.java
        11 kB
        Hoss Man
      14. TestBooleanMinShouldMatch.java
        11 kB
        Paul Elschot
      15. TestBooleanMinShouldMatch.java
        13 kB
        Hoss Man
      16. TestBooleanMinShouldMatch.java
        10 kB
        Hoss Man
      17. TestBooleanMinShouldMatch.java
        6 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          Created an attachment (id=15346)
          CoordConstrainedBooleanQuery class

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - Created an attachment (id=15346) CoordConstrainedBooleanQuery class
          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          Created an attachment (id=15347)
          CustomQueryParserExample class

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - Created an attachment (id=15347) CustomQueryParserExample class
          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          Damn. This coord trick works nicely on its own, eg
          authors:(min_coord:2 Erik Otis Doug)

          but doesnt work when combined with other queries like this:

          title:"lucene in action" AND authors:(min_coord:3 Erik Otis Doug)

          A result is returned even thought the coord restriction of 3 is NOT satisfied
          (Doug was not an author). Needs some more investigation.....

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - Damn. This coord trick works nicely on its own, eg authors:(min_coord:2 Erik Otis Doug) but doesnt work when combined with other queries like this: title:"lucene in action" AND authors:(min_coord:3 Erik Otis Doug) A result is returned even thought the coord restriction of 3 is NOT satisfied (Doug was not an author). Needs some more investigation.....
          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          (From update of attachment 15346)
          Does not work - have rewritten

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - (From update of attachment 15346) Does not work - have rewritten
          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          Created an attachment (id=15371)
          CoordConstrainedBooleanQuery

          Fixed issues so will work when nested in other Boolean queries - thanks to Paul
          for the scorer tips.

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - Created an attachment (id=15371) CoordConstrainedBooleanQuery Fixed issues so will work when nested in other Boolean queries - thanks to Paul for the scorer tips.
          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          (From update of attachment 15347)
          New version supports analysis better

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - (From update of attachment 15347) New version supports analysis better
          Hide
          markharw00d@yahoo.co.uk Mark Harwood added a comment -

          Created an attachment (id=15372)
          CustomQueryParserExample

          Added automatic support for preserving min_coord numeric parameters in query
          string whatever users' choice of analyzer is

          Show
          markharw00d@yahoo.co.uk Mark Harwood added a comment - Created an attachment (id=15372) CustomQueryParserExample Added automatic support for preserving min_coord numeric parameters in query string whatever users' choice of analyzer is
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          In the trunk this could be implemented by adding a minimumNrShouldMatch
          attribute to BooleanQuery and passing this as a minimum number of
          matchers to the DisjunctionSumScorer that is used to combine the
          clauses that should match, ie. the optional clauses.
          This minimumNrShouldMatchwould would have a role similar to the minimum
          required overlap in the CoordConstrainedBooleanQuery here,
          when disregarding the required and prohibited clauses.

          Regards,
          Paul Elschot

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - In the trunk this could be implemented by adding a minimumNrShouldMatch attribute to BooleanQuery and passing this as a minimum number of matchers to the DisjunctionSumScorer that is used to combine the clauses that should match, ie. the optional clauses. This minimumNrShouldMatchwould would have a role similar to the minimum required overlap in the CoordConstrainedBooleanQuery here, when disregarding the required and prohibited clauses. Regards, Paul Elschot
          Hide
          hossman Hoss Man added a comment -

          I don't really understand BooleanScorer2 very much, but i thought I understaood it enough to take a stab at the changes paul suggested to incorperate this type offunctionality into BooleanQuery using DisjunctionSumScorer.

          Not suprisingly, some of the new tests I wrote for this functionality don't pass. I can't figure out what the problem is. perhaps It will come to me later, but in the mean time I wanted to post what I had in case the problem is something really silly and trivial that I'm overlooking.

          Show
          hossman Hoss Man added a comment - I don't really understand BooleanScorer2 very much, but i thought I understaood it enough to take a stab at the changes paul suggested to incorperate this type offunctionality into BooleanQuery using DisjunctionSumScorer. Not suprisingly, some of the new tests I wrote for this functionality don't pass. I can't figure out what the problem is. perhaps It will come to me later, but in the mean time I wanted to post what I had in case the problem is something really silly and trivial that I'm overlooking.
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          I glanced through the code quickly, I'll give it a try in a week or so.

          BooleanScorer2 has one catch for which I did not see a check in the patch:
          A) when there is at least one required clause, none of the optional clauses need to match, but
          B) when there are no required clauses, at least one of the optional clauses needs to match.

          From what I saw (very quickly), only case B is implemented,
          but what should happen in case A?

          Regards,
          Paul Elschot

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - I glanced through the code quickly, I'll give it a try in a week or so. BooleanScorer2 has one catch for which I did not see a check in the patch: A) when there is at least one required clause, none of the optional clauses need to match, but B) when there are no required clauses, at least one of the optional clauses needs to match. From what I saw (very quickly), only case B is implemented, but what should happen in case A? Regards, Paul Elschot
          Hide
          hossman Hoss Man added a comment -

          Paul: i thought i had accounted for both situations, but you're right – it's not enough to make sure that all of the existing usages of DisjunctionSumScorer now use the min cut-off, because when there are required scorers, the existing uses of DisjunctionSumScorer are allways treated as optional. the internal logic in makeCountingSumScorer basically needs to be "turned inside out" so that if the min value is set, a DisjunctionSumScorer is constructed, and treated as a required scorer.

          I started trying to do this but i haven't really made any progress, and now i have to go get on a plane, so unless someone else takes a stab at it, I'll try to post an updated patch on monday.

          in the mean time, here is a revised version of the test class that accounts for more possible code paths in the existing logic (particularly, the difference in behavior when there is one required vs more then one required, and/or one prohibited vs more then one prohibited.

          Show
          hossman Hoss Man added a comment - Paul: i thought i had accounted for both situations, but you're right – it's not enough to make sure that all of the existing usages of DisjunctionSumScorer now use the min cut-off, because when there are required scorers, the existing uses of DisjunctionSumScorer are allways treated as optional. the internal logic in makeCountingSumScorer basically needs to be "turned inside out" so that if the min value is set, a DisjunctionSumScorer is constructed, and treated as a required scorer. I started trying to do this but i haven't really made any progress, and now i have to go get on a plane, so unless someone else takes a stab at it, I'll try to post an updated patch on monday. in the mean time, here is a revised version of the test class that accounts for more possible code paths in the existing logic (particularly, the difference in behavior when there is one required vs more then one required, and/or one prohibited vs more then one prohibited.
          Hide
          hossman Hoss Man added a comment -

          Okay - a new patch, and this one (amazingly) seems to work. All existing tests in SVN pass, as well as the new ones I've attached to this bug.

          The one thing still not quite accounted for is what the behavior should be if/when clients set the minNrShouldMatch to be a value higher then the number of optional clauses – or what the behavior should be if there is only one optional clause, and minNrShouldMatch==1 ... should it be treated as if it's required, or is that a miss-use of the api? Should these cases be checked in BooleanScorer2, or in BooleanQuery?

          Show
          hossman Hoss Man added a comment - Okay - a new patch, and this one (amazingly) seems to work. All existing tests in SVN pass, as well as the new ones I've attached to this bug. The one thing still not quite accounted for is what the behavior should be if/when clients set the minNrShouldMatch to be a value higher then the number of optional clauses – or what the behavior should be if there is only one optional clause, and minNrShouldMatch==1 ... should it be treated as if it's required, or is that a miss-use of the api? Should these cases be checked in BooleanScorer2, or in BooleanQuery?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          > if there is only one optional clause, and minNrShouldMatch==1 ... should it be treated as if it's required

          That would be my vote... it requires less special case checking when generating queries.

          > if/when clients set the minNrShouldMatch to be a value higher then the number of optional clauses

          Either treat all optional clauses as required, or don't match any clauses. I guess I would lean toward being literal and not match anything.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - > if there is only one optional clause, and minNrShouldMatch==1 ... should it be treated as if it's required That would be my vote... it requires less special case checking when generating queries. > if/when clients set the minNrShouldMatch to be a value higher then the number of optional clauses Either treat all optional clauses as required, or don't match any clauses. I guess I would lean toward being literal and not match anything.
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          I agree with Yonik here.

          > Should these cases be checked in BooleanScorer2, or in BooleanQuery?

          The easiest place is BooleanScorer2 since it already does a lot of checking to decide
          which scorers to use. It has this at line 180:
          return new NonMatchingScorer(); // only prohibited scorers
          which is a nice fit for the case when minNrShouldMatch is too high.

          I'll try the new code in a few days.

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - I agree with Yonik here. > Should these cases be checked in BooleanScorer2, or in BooleanQuery? The easiest place is BooleanScorer2 since it already does a lot of checking to decide which scorers to use. It has this at line 180: return new NonMatchingScorer(); // only prohibited scorers which is a nice fit for the case when minNrShouldMatch is too high. I'll try the new code in a few days.
          Hide
          hossman Hoss Man added a comment -

          Here is:

          1) Revised version of TestBooleanMinShouldMatch with new methods to to test for the border cases of:
          a) min is higher then the number of optional clauses
          b) min is equal to the number of optional clauses
          c) min and number of clauses are both 1
          d) there are no optional clauses, but a min is specified

          2) Revised version of patch that works against these new tests

          Show
          hossman Hoss Man added a comment - Here is: 1) Revised version of TestBooleanMinShouldMatch with new methods to to test for the border cases of: a) min is higher then the number of optional clauses b) min is equal to the number of optional clauses c) min and number of clauses are both 1 d) there are no optional clauses, but a min is specified 2) Revised version of patch that works against these new tests
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          The patch applies cleanly and all tests pass including the new TestBooleanMinShouldMatch.

          The code to tranform the optional scorers into a single required one with a minimum number of matches
          looks good, it's easy to understand and the tests pass.
          I'm not really happy with the place of this code, it would be better in the initCountingSumScorer()
          method of BooleanScorer2.

          Also I don't think maxCoord should be increased when this extra required scorer is introduced,
          because this new required scorer can be completely ignored for coordination, it is only a way to
          get the minimum number of required scorers. I'm not sure about this, though.
          Perhaps a testcase could be added that compares the score with and without a minimum
          number of matches on documents that match anyway.

          I'll keep this in my working copy and work on the code a bit, probably during next week.
          Please feel free to beat me to it.

          Regards,
          Paul Elschot

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - The patch applies cleanly and all tests pass including the new TestBooleanMinShouldMatch. The code to tranform the optional scorers into a single required one with a minimum number of matches looks good, it's easy to understand and the tests pass. I'm not really happy with the place of this code, it would be better in the initCountingSumScorer() method of BooleanScorer2. Also I don't think maxCoord should be increased when this extra required scorer is introduced, because this new required scorer can be completely ignored for coordination, it is only a way to get the minimum number of required scorers. I'm not sure about this, though. Perhaps a testcase could be added that compares the score with and without a minimum number of matches on documents that match anyway. I'll keep this in my working copy and work on the code a bit, probably during next week. Please feel free to beat me to it. Regards, Paul Elschot
          Hide
          hossman Hoss Man added a comment -

          I freely admit that I didn't give a lot of thought to incrimenting maxCoord. I did it only because if I didn't, coordinator.coordFactor() generated ArrayIndexOutOfBoundsExceptions – it assumes that the number of counted matches (nrMatchers) should be used to determine which element of coordFactor should be used, so I assumed that maxCoord allways needed to be equal the the maximum number of potential counted matches.

          Likewise, I put the code in Coordinator.init() instead of initCountingSumScorer() for the sole reason that since i was incrimenting maxCoord, I needed to do it before coordFactors[] was initialized. (allthough, I suppose it could have happened in initCountingSumScorer(), prior to the call to coordinator.init()).

          Did I mention I'm way over my head as far as this patch goes? ... I don't honestly understand the meaning/purpose of "coordination" – I just kind of did what I did because it seemed to work (and made some sense in the context of hte existing code).

          Show
          hossman Hoss Man added a comment - I freely admit that I didn't give a lot of thought to incrimenting maxCoord. I did it only because if I didn't, coordinator.coordFactor() generated ArrayIndexOutOfBoundsExceptions – it assumes that the number of counted matches (nrMatchers) should be used to determine which element of coordFactor should be used, so I assumed that maxCoord allways needed to be equal the the maximum number of potential counted matches. Likewise, I put the code in Coordinator.init() instead of initCountingSumScorer() for the sole reason that since i was incrimenting maxCoord, I needed to do it before coordFactors[] was initialized. (allthough, I suppose it could have happened in initCountingSumScorer(), prior to the call to coordinator.init()). Did I mention I'm way over my head as far as this patch goes? ... I don't honestly understand the meaning/purpose of "coordination" – I just kind of did what I did because it seemed to work (and made some sense in the context of hte existing code).
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          Some refactoring to make things more compact, no functional changes.

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - Some refactoring to make things more compact, no functional changes.
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          This a continuation of the earlier patch.
          The basic ideas are now built deep into the logic of determining the actual scorer to be used.
          For this logic needed quite a few changes, simplifying in many places, but also
          one more case (a conjunction scorer over (a) the required scorers and (b) a disjunction
          scorer over optional scorers with a minimum number that should match).

          This obsoletes another patch that splits off the coordination logic when it is not needed.
          I'll note this there later.

          Some tests for unchanged scoring behaviour are still needed, but those can
          be added later, since this is probably going to work well.

          Again, thanks for this idea and for the test cases.

          Regards,
          Paul Elschot

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - This a continuation of the earlier patch. The basic ideas are now built deep into the logic of determining the actual scorer to be used. For this logic needed quite a few changes, simplifying in many places, but also one more case (a conjunction scorer over (a) the required scorers and (b) a disjunction scorer over optional scorers with a minimum number that should match). This obsoletes another patch that splits off the coordination logic when it is not needed. I'll note this there later. Some tests for unchanged scoring behaviour are still needed, but those can be added later, since this is probably going to work well. Again, thanks for this idea and for the test cases. Regards, Paul Elschot
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          The code that is obsoleted by this BooleanScorer2 is at issue LUCENE-364.

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - The code that is obsoleted by this BooleanScorer2 is at issue LUCENE-364 .
          Hide
          hossman Hoss Man added a comment -

          fixed a one character typo in a test name that caused the test to not be run...

          Show
          hossman Hoss Man added a comment - fixed a one character typo in a test name that caused the test to not be run...
          Hide
          hossman Hoss Man added a comment -

          To recap:

          • The orriginal bug proposed a new Query class, and provided a sample query parser that leveraged it.
          • subsequent discussions have led to a patch for BooleanQuery and BooleanScorer to support this functionality without needing a new class

          Looking at the example query parser for the first time today, I don't think it's a very clean way to proceed - if for no other reason then those provided by the other...

          //Note (mis)use of fieldname "min_coord" to pass the required parameter - a more formal
          //support for passing query function_names and parameters to user-defined query factories
          //would be a very useful addition to the QueryParser base class

          One idea that occured to me earlier today, is that "~" could be used to indicate the minNrShouldMatch. This occured to me becuase specifying a "tolerance" in the number of optional subclauses seems similar to me to specifying the slop in a phrase query.

          I don't have a patch for this, i just wanted to post it (in case i get hit by a bus).

          Show
          hossman Hoss Man added a comment - To recap: The orriginal bug proposed a new Query class, and provided a sample query parser that leveraged it. subsequent discussions have led to a patch for BooleanQuery and BooleanScorer to support this functionality without needing a new class Looking at the example query parser for the first time today, I don't think it's a very clean way to proceed - if for no other reason then those provided by the other... //Note (mis)use of fieldname "min_coord" to pass the required parameter - a more formal //support for passing query function_names and parameters to user-defined query factories //would be a very useful addition to the QueryParser base class One idea that occured to me earlier today, is that "~" could be used to indicate the minNrShouldMatch. This occured to me becuase specifying a "tolerance" in the number of optional subclauses seems similar to me to specifying the slop in a phrase query. I don't have a patch for this, i just wanted to post it (in case i get hit by a bus).
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          Another way would be to override getBooleanQuery in QueryParser
          and check there whether one of the given clauses has a special form,
          for example _min2 or _min3 indicating the minimum number of remaining
          clauses that should match for this boolean query.

          Regards,
          Paul Elschot

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - Another way would be to override getBooleanQuery in QueryParser and check there whether one of the given clauses has a special form, for example _min2 or _min3 indicating the minimum number of remaining clauses that should match for this boolean query. Regards, Paul Elschot
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I've been reviewing this patch along with BooleanScorer2 in general since It's my first exposure to it (it's certaily more complex than the original).

          There are so many decision points in the creation of the scorers that I wrote a test to create random boolean queries to verify that things match the original.
          One thing I found is that when minNrShou ldMatch>0 with nested boolean queries, the scores may not match for those documents that match the same query with minNrShou ldMatch=0. Is this expected?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I've been reviewing this patch along with BooleanScorer2 in general since It's my first exposure to it (it's certaily more complex than the original). There are so many decision points in the creation of the scorers that I wrote a test to create random boolean queries to verify that things match the original. One thing I found is that when minNrShou ldMatch>0 with nested boolean queries, the scores may not match for those documents that match the same query with minNrShou ldMatch=0. Is this expected?
          Hide
          hossman Hoss Man added a comment -

          : One thing I found is that when minNrShouldMatch>0 with
          : nested boolean queries, the scores may not match for those
          : documents that match the same query with
          : minNrShouldMatch=0. Is this expected?

          ...hmm. assuming you mean that you only set minNrShouldMatch in the root query, and the nested queries are vanilla BooleanQueries – then no, that doesn't sound expected.

          If the subqueries had minNrShouldMatch set (regardless of wether or not the root query did), then yes i can imagine some cases where the scores would be different (even if the result set is the same). Consider the following nesting scenerio...

          ( (A B X) (C D Y) (E W Z) )

          If document #42 matches A, B, C, D, and E, but does NOT match X, Y or Z, then it will easily match this query. Even if all of the subqueries are modified so that minNrShouldMatch=2, the result set will be the same – but the score for document#42 SHOULD be different, because it no longer matches enough individual clauses in the third sub query to get a score contribution from it (ie: the match on E no longer improves the score)

          But, if all you do is set minNrShouldMatch > 0 on the outermost query ... I can't think of any reason why the score would chance if the document still matches (other then a bug)

          Can you post your test?

          Show
          hossman Hoss Man added a comment - : One thing I found is that when minNrShouldMatch>0 with : nested boolean queries, the scores may not match for those : documents that match the same query with : minNrShouldMatch=0. Is this expected? ...hmm. assuming you mean that you only set minNrShouldMatch in the root query, and the nested queries are vanilla BooleanQueries – then no, that doesn't sound expected. If the subqueries had minNrShouldMatch set (regardless of wether or not the root query did), then yes i can imagine some cases where the scores would be different (even if the result set is the same). Consider the following nesting scenerio... ( (A B X) (C D Y) (E W Z) ) If document #42 matches A, B, C, D, and E, but does NOT match X, Y or Z, then it will easily match this query. Even if all of the subqueries are modified so that minNrShouldMatch=2, the result set will be the same – but the score for document#42 SHOULD be different, because it no longer matches enough individual clauses in the third sub query to get a score contribution from it (ie: the match on E no longer improves the score) But, if all you do is set minNrShouldMatch > 0 on the outermost query ... I can't think of any reason why the score would chance if the document still matches (other then a bug) Can you post your test?
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          >There are so many decision points in the creation of the scorers that I wrote a test to
          > create random boolean queries to verify that things match the original.

          Thank you very much. I think there are not enough tests for boolean queries, so
          such tests are very welcome.

          > One thing I found is that when minNrShou ldMatch>0 with nested boolean queries, the
          > scores may not match for those documents that match the same query with
          > minNrShouldMatch=0. Is this expected?

          Yes, see also Hoss Mann's comment.
          This can happen in the case that a nested query (eg. (E W Z) above) does
          not match because of its minNrShould match,
          while the full query still matches because it has enough
          matching subqueries left.
          In that case the full query has a smaller score because fewer of its subqueries match.

          Btw. once this has settled I'd like to redo the splitting off the coordation part when
          it is not needed, but that is only for performance.

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - >There are so many decision points in the creation of the scorers that I wrote a test to > create random boolean queries to verify that things match the original. Thank you very much. I think there are not enough tests for boolean queries, so such tests are very welcome. > One thing I found is that when minNrShou ldMatch>0 with nested boolean queries, the > scores may not match for those documents that match the same query with > minNrShouldMatch=0. Is this expected? Yes, see also Hoss Mann's comment. This can happen in the case that a nested query (eg. (E W Z) above) does not match because of its minNrShould match, while the full query still matches because it has enough matching subqueries left. In that case the full query has a smaller score because fewer of its subqueries match. Btw. once this has settled I'd like to redo the splitting off the coordation part when it is not needed, but that is only for performance.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Thanks for the scoring clarification guys... if I only set minNrShouldMatch>0 on the top level query, all the scores match!

          Changes:

          • added BooleanQuery.getMinimumNumberShouldMatch()
          • added the syntax Chris (Hoss) suggested to BooleanQuery.toString()... but it's not too late to change if someone has a better idea
          • added random boolean query testing with variable minMrShouldMatch to TestBooleanMinShouldMatch
          Show
          yseeley@gmail.com Yonik Seeley added a comment - Thanks for the scoring clarification guys... if I only set minNrShouldMatch>0 on the top level query, all the scores match! Changes: added BooleanQuery.getMinimumNumberShouldMatch() added the syntax Chris (Hoss) suggested to BooleanQuery.toString()... but it's not too late to change if someone has a better idea added random boolean query testing with variable minMrShouldMatch to TestBooleanMinShouldMatch
          Hide
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          Patch to TestBoolean2.java to use try/finally to reset the static useScorer14 flag in BooleanQuery
          even when a test fails.
          This has bitten me once when one of the tests started failing...

          Show
          paul.elschot@xs4all.nl Paul Elschot added a comment - Patch to TestBoolean2.java to use try/finally to reset the static useScorer14 flag in BooleanQuery even when a test fails. This has bitten me once when one of the tests started failing...
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          attaching new BooleanQuery.patch that fixes a toString() typo and syncs with head.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - attaching new BooleanQuery.patch that fixes a toString() typo and syncs with head.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I like this patch, and I'd like to see it committed before 1.9 is released.
          Only two outstanding issues:
          1) Deciding on a syntax and adding QueryParser support... I think that's a separate issue and can be addressed later.
          2) Performance optimizations when coord isn't needed... that also looks like a separate issue that can be dealt with later.

          So it looks to me like this is ready to be committed. Does anyone see a reason why it shouldn't be?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I like this patch, and I'd like to see it committed before 1.9 is released. Only two outstanding issues: 1) Deciding on a syntax and adding QueryParser support... I think that's a separate issue and can be addressed later. 2) Performance optimizations when coord isn't needed... that also looks like a separate issue that can be dealt with later. So it looks to me like this is ready to be committed. Does anyone see a reason why it shouldn't be?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          fixed BooleanQuery hashCode/equals and committed patches.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - fixed BooleanQuery hashCode/equals and committed patches.

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              markharw00d@yahoo.co.uk Mark Harwood
            • Votes:
              2 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development