Lucene - Core
  1. Lucene - Core
  2. LUCENE-383

ConstantScoreRangeQuery - fixes "too many clauses" exception

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.9, 2.0.0
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      ConstantScoreQuery wraps a filter (representing a set of documents) and returns
      a constant score for each document in the set.

      ConstantScoreRangeQuery implements a RangeQuery that works for any number of
      terms in the range. It rewrites to a ConstantScoreQuery that wraps a RangeFilter.

      Still needed:

      • unit tests (these classes have been tested and work fine in-house, but the
        current tests rely on too much application specific code)
      • code review of Weight() implementation (I'm unsure If I got all the score
        normalization stuff right)
      • explain() implementation

      NOTE: requires Java 1.4 for BitSet.nextSetBit()

      1. TestConstantScoreRangeQuery.java
        11 kB
        Hoss Man
      2. ConstantScoreRangeQuery.java
        5 kB
        Yonik Seeley
      3. ConstantScoreQuery.java
        5 kB
        Yonik Seeley
      4. ASF.LICENSE.NOT.GRANTED--ConstantScoreRangeQuery.java
        6 kB
        Yonik Seeley
      5. ASF.LICENSE.NOT.GRANTED--ConstantScoreQuery.java
        4 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        Created an attachment (id=14877)
        ConstantScoreQuery

        Show
        Yonik Seeley added a comment - Created an attachment (id=14877) ConstantScoreQuery
        Hide
        Yonik Seeley added a comment -

        Created an attachment (id=14878)
        ConstantScoreRangeQuery

        Show
        Yonik Seeley added a comment - Created an attachment (id=14878) ConstantScoreRangeQuery
        Hide
        Paul Elschot added a comment -

        The ConstantScoreQuery requires java 1.4 for its skipTo() implementation,
        so this will have to wait until java 1.4 is a minimum requirement for
        Lucene.
        Some time ago there was a bit of discussion on mixed boolean and
        scored queries, and I think the ConstantScoreQuery is a wellcome
        addition for such mixed queries.
        One problem with the current score is that it is not really bound
        to a maximum, so it is difficult to choose a good constant score
        value in a mix.
        Perhaps that aspect of scoring should be tackled in the future.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - The ConstantScoreQuery requires java 1.4 for its skipTo() implementation, so this will have to wait until java 1.4 is a minimum requirement for Lucene. Some time ago there was a bit of discussion on mixed boolean and scored queries, and I think the ConstantScoreQuery is a wellcome addition for such mixed queries. One problem with the current score is that it is not really bound to a maximum, so it is difficult to choose a good constant score value in a mix. Perhaps that aspect of scoring should be tackled in the future. Regards, Paul Elschot
        Hide
        Yonik Seeley added a comment -

        Refresh:

        • implemented explain()
        • changed scoring so that score is simply the boost
        • fixed bug in scoring (weight normalization)
        • fixed bug where boost was not copied during rewrite
        Show
        Yonik Seeley added a comment - Refresh: implemented explain() changed scoring so that score is simply the boost fixed bug in scoring (weight normalization) fixed bug where boost was not copied during rewrite
        Hide
        Yonik Seeley added a comment -

        > One problem with the current score is that it is not really bound
        > to a maximum, so it is difficult to choose a good constant score
        > value in a mix.

        tf() and lengthNorm() factors will vary by index, but I think idf() may be the most problematic since it can change over time as the index grows.

        Would it be better to add something like an idf(1,maxDocs()) factor?

        Show
        Yonik Seeley added a comment - > One problem with the current score is that it is not really bound > to a maximum, so it is difficult to choose a good constant score > value in a mix. tf() and lengthNorm() factors will vary by index, but I think idf() may be the most problematic since it can change over time as the index grows. Would it be better to add something like an idf(1,maxDocs()) factor?
        Hide
        Paul Elschot added a comment -

        Since the constant score is taken from the query boost, idf issues
        can be dealt with elsewhere. IOW I don't think there is a need to
        deal with idf here.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Since the constant score is taken from the query boost, idf issues can be dealt with elsewhere. IOW I don't think there is a need to deal with idf here. Regards, Paul Elschot
        Hide
        Doug Cutting added a comment -

        I like this patch. It would be nice to have a unit test, and we need to agree that Lucene 1.9 will require java 1.4 (which seems reasonable to me) before we commit it.

        Once this is committed, should we consider changing the query parser to use ConstantScoreRangeQuery for range queries? That alone would probably solve the majority of too-many-clauses problems.

        Show
        Doug Cutting added a comment - I like this patch. It would be nice to have a unit test, and we need to agree that Lucene 1.9 will require java 1.4 (which seems reasonable to me) before we commit it. Once this is committed, should we consider changing the query parser to use ConstantScoreRangeQuery for range queries? That alone would probably solve the majority of too-many-clauses problems.
        Hide
        Yonik Seeley added a comment -

        The functionality is definitely needed, I'm just not sure of the API yet.
        I also have a ConstantScorePrefixQuery/PrefixFilter that just needs to be cleaned up a bit before it can be contributed.

        Doug, you had brought up the possibility of having a flag on Query indicating if it was constant scoring or not... it would be simpler than reproducing Constant versions of all the query types.
        The downside of that is the whole issue of query immutability... if you set the constant-scoring flag on the root query, it really needs to set it on all of it's sub queries (and hence changes them). You can easily get into trouble if a query is shared.

        To avoid changing queries, you would have to follow the same strategy as rewrite()...
        You would need to clone the query object, then set constantScoring, then call rewrite (which may clone it again). Not the best for performance...

        One solution is to communicate the fact that we want a constant scoring query in the rewrite method itself: rewrite(Query query, int flags)

        Show
        Yonik Seeley added a comment - The functionality is definitely needed, I'm just not sure of the API yet. I also have a ConstantScorePrefixQuery/PrefixFilter that just needs to be cleaned up a bit before it can be contributed. Doug, you had brought up the possibility of having a flag on Query indicating if it was constant scoring or not... it would be simpler than reproducing Constant versions of all the query types. The downside of that is the whole issue of query immutability... if you set the constant-scoring flag on the root query, it really needs to set it on all of it's sub queries (and hence changes them). You can easily get into trouble if a query is shared. To avoid changing queries, you would have to follow the same strategy as rewrite()... You would need to clone the query object, then set constantScoring, then call rewrite (which may clone it again). Not the best for performance... One solution is to communicate the fact that we want a constant scoring query in the rewrite method itself: rewrite(Query query, int flags)
        Hide
        Daniel Naber added a comment -

        Doug: actually we agreed to require Java 1.4 some time ago already. You can find that dicussion somewhere in the mail archives.

        Show
        Daniel Naber added a comment - Doug: actually we agreed to require Java 1.4 some time ago already. You can find that dicussion somewhere in the mail archives.
        Hide
        Hoss Man added a comment -

        I hadn't heard about Doug's plans that Yonik refered to (to revamp all Query classes to support a constant score option) so forgive me if this is way off the mark:

        If Doug's plan is fairly solid and just needs to be cranked out, then by all means it might make sense to go that route intstead of having seperate ConstantScoreRange and ConstantScorePrefix queries ... but if that plan is still very hypotheitcal, then perhaps the best course of action would be to commit Yonik's existing code into the contrib section.

        Nothing here requires any changes to the core codebase, and as of 1.9 the contrib code will start being reved/released on teh same schedule as the core correct? ... so there's really no downside to putting it in contrib. If the other idea falls through, then this code could be "promoted" from contrib to the core (and perhaps then QueryParser could be changed to use it by default). If the other plan does get implimented, then these classes can be deprecated in favor of the new ones (and their new API)

        Show
        Hoss Man added a comment - I hadn't heard about Doug's plans that Yonik refered to (to revamp all Query classes to support a constant score option) so forgive me if this is way off the mark: If Doug's plan is fairly solid and just needs to be cranked out, then by all means it might make sense to go that route intstead of having seperate ConstantScoreRange and ConstantScorePrefix queries ... but if that plan is still very hypotheitcal, then perhaps the best course of action would be to commit Yonik's existing code into the contrib section. Nothing here requires any changes to the core codebase, and as of 1.9 the contrib code will start being reved/released on teh same schedule as the core correct? ... so there's really no downside to putting it in contrib. If the other idea falls through, then this code could be "promoted" from contrib to the core (and perhaps then QueryParser could be changed to use it by default). If the other plan does get implimented, then these classes can be deprecated in favor of the new ones (and their new API)
        Hide
        Yonik Seeley added a comment -
        Show
        Yonik Seeley added a comment - Doug's suggestion is here: http://www.mail-archive.com/java-dev@lucene.apache.org/msg00532.html
        Hide
        Doug Cutting added a comment -

        Yonik: I'd forgotten about that suggestion, thanks for remembering it!

        Hoss: I think there are some nasty details still to be worked out in that suggestion, like how to cache bitvectors. Long-term, I think it would be a better approach, but Yonik's ConstantScoreQuery stuff works well today.

        I think fixing common use cases, like range queries, from exploding in people's faces is a high priority. So I would argue for changing QueryParser sooner rather than later. If we ever implement my proposal then we can deprecate these classes. The majority of folks just use QueryParser, and they won't know the difference.

        Show
        Doug Cutting added a comment - Yonik: I'd forgotten about that suggestion, thanks for remembering it! Hoss: I think there are some nasty details still to be worked out in that suggestion, like how to cache bitvectors. Long-term, I think it would be a better approach, but Yonik's ConstantScoreQuery stuff works well today. I think fixing common use cases, like range queries, from exploding in people's faces is a high priority. So I would argue for changing QueryParser sooner rather than later. If we ever implement my proposal then we can deprecate these classes. The majority of folks just use QueryParser, and they won't know the difference.
        Hide
        Hoss Man added a comment -

        a Unit test for ConstantScoreRangeQuery ... that was easy to crank out based on the existing BaseTestRangeFilter.

        One catch is that the test currently fails: when i tried to prove that the scores were not only constant, but also equaled the boost I couldn't get it to work – perhaps I have a missunderstanding of how HItCollector works, but I thought that was the safest way to get a "raw" score.

        Show
        Hoss Man added a comment - a Unit test for ConstantScoreRangeQuery ... that was easy to crank out based on the existing BaseTestRangeFilter. One catch is that the test currently fails: when i tried to prove that the scores were not only constant, but also equaled the boost I couldn't get it to work – perhaps I have a missunderstanding of how HItCollector works, but I thought that was the safest way to get a "raw" score.
        Hide
        Yonik Seeley added a comment -

        Cool, thanks for the tests!

        You are getting the raw score from the HitCollector all right. The internal score is equal to the boost, but there is also a query normalization factor that's applied...

        Query.weight() calls Weight.sumOfSquaredWeights() which returns boost*boost.
        Then it calls Similarity.queryNorm(boost*boost) which returns 1/boost
        Then it calls Weight.normalize(1/boost), and I multiply 1/boost by boost and get 1.0

        Show
        Yonik Seeley added a comment - Cool, thanks for the tests! You are getting the raw score from the HitCollector all right. The internal score is equal to the boost, but there is also a query normalization factor that's applied... Query.weight() calls Weight.sumOfSquaredWeights() which returns boost*boost. Then it calls Similarity.queryNorm(boost*boost) which returns 1/boost Then it calls Weight.normalize(1/boost), and I multiply 1/boost by boost and get 1.0
        Hide
        Yonik Seeley added a comment -

        I have committed these classes.
        Unless I hear objections, I'll change the QueryParser to use ConstantScoreRangeQuery,
        and allow the specification of open-ended ranges using "*".

        Show
        Yonik Seeley added a comment - I have committed these classes. Unless I hear objections, I'll change the QueryParser to use ConstantScoreRangeQuery, and allow the specification of open-ended ranges using "*".
        Hide
        Nadav Har'El added a comment -

        Hi,

        It appears that ConstantScoreRangeQuery is already in the trunk.
        However, QueryParser still uses RangeQuery, not ConstantScoreRangeQuery.

        Should this issue be closed? Or are you still intending to change QueryParser to use the new class?

        Show
        Nadav Har'El added a comment - Hi, It appears that ConstantScoreRangeQuery is already in the trunk. However, QueryParser still uses RangeQuery, not ConstantScoreRangeQuery. Should this issue be closed? Or are you still intending to change QueryParser to use the new class?
        Hide
        Hoss Man added a comment -

        I believe (but i'm not certain) that i remember a discusion on java-dev a while back regarding this and that there was some concensus not to make the QueryParser change until 2.0 so as not to affect the scores of existing searches.

        i could be completely wrong

        Show
        Hoss Man added a comment - I believe (but i'm not certain) that i remember a discusion on java-dev a while back regarding this and that there was some concensus not to make the QueryParser change until 2.0 so as not to affect the scores of existing searches. i could be completely wrong
        Hide
        Doug Cutting added a comment -

        This has been fixed.

        Show
        Doug Cutting added a comment - This has been fixed.

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            5 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development