Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      This disjunction scorer can match a minimum nr. of docs,
      it provides skipTo() and it uses skipTo() on the subscorers.
      The score() method is abstract in DisjunctionScorer and implemented
      in DisjunctionSumScorer as an example.

        Activity

        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13147)
        DisjunctionScorer.java

        Show
        Paul Elschot added a comment - Created an attachment (id=13147) DisjunctionScorer.java
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13148)
        DisjunctionSumScorer.java

        Show
        Paul Elschot added a comment - Created an attachment (id=13148) DisjunctionSumScorer.java
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13149)
        DisjunctionSumCoordScorer.java

        Show
        Paul Elschot added a comment - Created an attachment (id=13149) DisjunctionSumCoordScorer.java
        Hide
        cutting@apache.org added a comment -

        This looks great to me!

        Have you had a chance to benchmark it against BooleanScorer? I'd expect it to
        be faster with rare terms and slower with common terms. Differences might not
        be significant on small indexes.

        Also, have you thought about keeping track of which scorers matched, so that
        this could implement boolean logic? For example, if we kept an int or long
        bitmask with bits for prohibited and/or required sub-scorers, then this could
        fully replace BooleanQuery.

        Show
        cutting@apache.org added a comment - This looks great to me! Have you had a chance to benchmark it against BooleanScorer? I'd expect it to be faster with rare terms and slower with common terms. Differences might not be significant on small indexes. Also, have you thought about keeping track of which scorers matched, so that this could implement boolean logic? For example, if we kept an int or long bitmask with bits for prohibited and/or required sub-scorers, then this could fully replace BooleanQuery.
        Hide
        Paul Elschot added a comment -

        I have not yet tested it on bigger indexes, sorry.
        The thing is under development, currently it prepares
        an array of scores for the abstract combineScores() method
        with a Float.NaN for subscorers not at the current document.

        I don't see performance bottlenecks in the Java code,
        but I know I can't predict a profiler...

        I expect it to be somewhat slower than BooleanScorer.
        The main reason to implement it is that I need skipTo()
        to allow very sparse filters. So sparse that the advantage
        of skipTo() outweighs the disadvantage of the PriorityQueue.

        For required subscorers Lucene's ConjunctionScorer does well.
        For prohibited subscorers I have a scorer for required/excluded,
        which follows the required scorer and does skipTo() on the
        excluded scorer.
        For optional subscorers I'm using a required/optional scorer,
        which delays skipTo() on the optional subscorers until score()
        is called.

        These four scorers (Conjunction, Disjunction, ReqExcl, ReqOpt) can
        implement all boolean queries. They were designed for boolean
        operators (AND, OR, NOT and an operator for required/optional)
        so they don't fit directly on queries from Lucene's parser
        where each clause can be required/prohibited/optional.

        I wouldn't mind contributing the required/optional scorer and the
        required/excluded scorer as well.

        Show
        Paul Elschot added a comment - I have not yet tested it on bigger indexes, sorry. The thing is under development, currently it prepares an array of scores for the abstract combineScores() method with a Float.NaN for subscorers not at the current document. I don't see performance bottlenecks in the Java code, but I know I can't predict a profiler... I expect it to be somewhat slower than BooleanScorer. The main reason to implement it is that I need skipTo() to allow very sparse filters. So sparse that the advantage of skipTo() outweighs the disadvantage of the PriorityQueue. For required subscorers Lucene's ConjunctionScorer does well. For prohibited subscorers I have a scorer for required/excluded, which follows the required scorer and does skipTo() on the excluded scorer. For optional subscorers I'm using a required/optional scorer, which delays skipTo() on the optional subscorers until score() is called. These four scorers (Conjunction, Disjunction, ReqExcl, ReqOpt) can implement all boolean queries. They were designed for boolean operators (AND, OR, NOT and an operator for required/optional) so they don't fit directly on queries from Lucene's parser where each clause can be required/prohibited/optional. I wouldn't mind contributing the required/optional scorer and the required/excluded scorer as well.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13358)
        A replacement for BooleanQuery using ao DisjunctionScorer.

        Show
        Paul Elschot added a comment - Created an attachment (id=13358) A replacement for BooleanQuery using ao DisjunctionScorer.
        Hide
        Paul Elschot added a comment -

        The tgz attachment can be extracted in the top directory of
        a current lucene working copy. This will add new files and
        overwrite some existing ones, see below.

        I tried this new code on an index of around 1.4 MB,
        and saw no difference in performance. In case someone else
        could report performance on something of more decent size
        I'd be happy to know how that went.

        New code, all under Apache Licence Version 2:

        Test for BooleanScorer2, with some example code
        overriding QueryParser to construct queries with
        BooleanQuery2:
        src/test/org/apache/lucene/search/TestBoolean2.java

        As BooleanQuery, but directly using BooleanScorer2:
        src/java/org/apache/lucene/search/BooleanQuery2.java

        Replacement for BooleanScorer, using the scorers below:
        src/java/org/apache/lucene/search/BooleanScorer2.java

        Counterpart of ConjunctionScorer, allowing a minimal nr of matchers:
        src/java/org/apache/lucene/search/DisjunctionSumScorer.java

        Helper for case of only prohibited subscorers:
        src/java/org/apache/lucene/search/NonMatchingScorer.java

        Extension to Scorer to allow coordination factor over
        multiple levels of subscorers:
        src/java/org/apache/lucene/search/NrMatchersScorer.java

        For required and prohibited subscorers:
        src/java/org/apache/lucene/search/ReqExclScorer.java

        For required and optional subscorers:
        src/java/org/apache/lucene/search/ReqOptSumScorer.java

        Some "Expert:" annotations may still be needed in the javadocs.

        Changes to existing code, also APL 2:

        Redirect BooleanQuery to BooleanScorer2 for testing
        with current Lucene tests. The tests pass with
        this modification. Not recommended for other purposes:
        src/java/org/apache/lucene/search/BooleanQuery.java

        Added some helping code for tests by TestBoolean2:
        src/test/org/apache/lucene/search/CheckHits.java

        ConjunctionScorer: explicit imports, extend NrMatchersScorer:
        src/java/org/apache/lucene/search/ConjunctionScorer.java

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - The tgz attachment can be extracted in the top directory of a current lucene working copy. This will add new files and overwrite some existing ones, see below. I tried this new code on an index of around 1.4 MB, and saw no difference in performance. In case someone else could report performance on something of more decent size I'd be happy to know how that went. New code, all under Apache Licence Version 2: Test for BooleanScorer2, with some example code overriding QueryParser to construct queries with BooleanQuery2: src/test/org/apache/lucene/search/TestBoolean2.java As BooleanQuery, but directly using BooleanScorer2: src/java/org/apache/lucene/search/BooleanQuery2.java Replacement for BooleanScorer, using the scorers below: src/java/org/apache/lucene/search/BooleanScorer2.java Counterpart of ConjunctionScorer, allowing a minimal nr of matchers: src/java/org/apache/lucene/search/DisjunctionSumScorer.java Helper for case of only prohibited subscorers: src/java/org/apache/lucene/search/NonMatchingScorer.java Extension to Scorer to allow coordination factor over multiple levels of subscorers: src/java/org/apache/lucene/search/NrMatchersScorer.java For required and prohibited subscorers: src/java/org/apache/lucene/search/ReqExclScorer.java For required and optional subscorers: src/java/org/apache/lucene/search/ReqOptSumScorer.java Some "Expert:" annotations may still be needed in the javadocs. Changes to existing code, also APL 2: Redirect BooleanQuery to BooleanScorer2 for testing with current Lucene tests. The tests pass with this modification. Not recommended for other purposes: src/java/org/apache/lucene/search/BooleanQuery.java Added some helping code for tests by TestBoolean2: src/test/org/apache/lucene/search/CheckHits.java ConjunctionScorer: explicit imports, extend NrMatchersScorer: src/java/org/apache/lucene/search/ConjunctionScorer.java Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13739)
        The replacement for BooleanScorer built into BooleanQuery

        Adding BooleanScorer2 to Lucene, 12 Dec 2004.

        The previous version of 8 Nov 2004 also contained BooleanQuery2,
        this is now merged into BooleanQuery.

        New code, all under Apache Licence Version2, mostly unchanged
        from 8 November 2004:

        Note: some "Expert:" annotations may still be needed in the javadocs.

        Test for BooleanScorer, with some example code
        using both 1.4 scorer and new BooleanScorer2:
        src/test/org/apache/lucene/search/TestBoolean2.java

        Alternative for BooleanScorer, using the scorers below:
        src/java/org/apache/lucene/search/BooleanScorer2.java

        Counterpart of ConjunctionScorer, allowing a minimal nr of matchers:
        src/java/org/apache/lucene/search/DisjunctionSumScorer.java

        Helper for case of only prohibited subscorers:
        src/java/org/apache/lucene/search/NonMatchingScorer.java

        Extension to Scorer to allow coordination factor over
        multiple levels of subscorers, this could be merged
        into the current Scorer with a default of 1 for nrMatchers():
        src/java/org/apache/lucene/search/NrMatchersScorer.java

        For required and prohibited subscorers:
        src/java/org/apache/lucene/search/ReqExclScorer.java

        For required and optional subscorers:
        src/java/org/apache/lucene/search/ReqOptSumScorer.java

        Changes to existing code, all three changed from
        the previous version of 8 November 2004:

        Redirect BooleanQuery to BooleanScorer2 under
        control of static methods setUseScorer14 and getUseScorer14,
        default using BooleanScorer2, as requested by Doug:
        src/java/org/apache/lucene/search/BooleanQuery.java

        Added some helping code for tests by TestBoolean2,
        This uses the setUseScorer14 method to test both versions:
        src/test/org/apache/lucene/search/CheckHits.java

        ConjunctionScorer: explicit imports, extend NrMatchersScorer:
        src/java/org/apache/lucene/search/ConjunctionScorer.java

        Show
        Paul Elschot added a comment - Created an attachment (id=13739) The replacement for BooleanScorer built into BooleanQuery Adding BooleanScorer2 to Lucene, 12 Dec 2004. The previous version of 8 Nov 2004 also contained BooleanQuery2, this is now merged into BooleanQuery. New code, all under Apache Licence Version2, mostly unchanged from 8 November 2004: Note: some "Expert:" annotations may still be needed in the javadocs. Test for BooleanScorer, with some example code using both 1.4 scorer and new BooleanScorer2: src/test/org/apache/lucene/search/TestBoolean2.java Alternative for BooleanScorer, using the scorers below: src/java/org/apache/lucene/search/BooleanScorer2.java Counterpart of ConjunctionScorer, allowing a minimal nr of matchers: src/java/org/apache/lucene/search/DisjunctionSumScorer.java Helper for case of only prohibited subscorers: src/java/org/apache/lucene/search/NonMatchingScorer.java Extension to Scorer to allow coordination factor over multiple levels of subscorers, this could be merged into the current Scorer with a default of 1 for nrMatchers(): src/java/org/apache/lucene/search/NrMatchersScorer.java For required and prohibited subscorers: src/java/org/apache/lucene/search/ReqExclScorer.java For required and optional subscorers: src/java/org/apache/lucene/search/ReqOptSumScorer.java Changes to existing code, all three changed from the previous version of 8 November 2004: Redirect BooleanQuery to BooleanScorer2 under control of static methods setUseScorer14 and getUseScorer14, default using BooleanScorer2, as requested by Doug: src/java/org/apache/lucene/search/BooleanQuery.java Added some helping code for tests by TestBoolean2, This uses the setUseScorer14 method to test both versions: src/test/org/apache/lucene/search/CheckHits.java ConjunctionScorer: explicit imports, extend NrMatchersScorer: src/java/org/apache/lucene/search/ConjunctionScorer.java
        Hide
        Paul Elschot added a comment -

        Correction to a small mistake:
        The setUseScorer14 method is used in TestBoolean2.java
        and not in CheckHits.java.

        Show
        Paul Elschot added a comment - Correction to a small mistake: The setUseScorer14 method is used in TestBoolean2.java and not in CheckHits.java.
        Hide
        Christoph Goller added a comment -

        Hi Paul,

        I finally found time to look into your code in detail and I think
        it's really excellent work. Before committing it, I have a few questions.

        *) In your source files you have included a copyright statement referring
        to yourself. Of course you include the Apache License. However, I haven't seen
        other source files in Lucene with similar copyright statements. I don't know the
        legal consequences of that. Maybe someone else on the list knows more. The
        simplest solution would be to substitute "Copyright 2004 Paul Elschot" with
        "Copyright 2004 The Apache Software Foundation". Would you agree?

        *) BooleanScorer2 extends NrMatchersScorer and nrMatchers() always returns 1.
        Is there a reason for that? I think it should either only extend Scorer or
        deliver the correct values. I opt for extending Scorer only.

        *) All NrMatchersScorers except for BooleanScorer2 and ConjunctionScorer don't
        use a similarity implementation. They compute raw scores and nrMatches.
        ConjunctionScorer is a hybrid. It uses coord-factors and is is used as
        NrMatchersScorer. This could lead to incorrect results with Similarity
        implementations other than DefaultSimilarity. A ConjunctionScorer used as
        NrMathesScorer should compute raw scores, if used as standard Scorer it
        should use coord-factors. How can we achieve this in an elegant way?

        Christoph

        Show
        Christoph Goller added a comment - Hi Paul, I finally found time to look into your code in detail and I think it's really excellent work. Before committing it, I have a few questions. *) In your source files you have included a copyright statement referring to yourself. Of course you include the Apache License. However, I haven't seen other source files in Lucene with similar copyright statements. I don't know the legal consequences of that. Maybe someone else on the list knows more. The simplest solution would be to substitute "Copyright 2004 Paul Elschot" with "Copyright 2004 The Apache Software Foundation". Would you agree? *) BooleanScorer2 extends NrMatchersScorer and nrMatchers() always returns 1. Is there a reason for that? I think it should either only extend Scorer or deliver the correct values. I opt for extending Scorer only. *) All NrMatchersScorers except for BooleanScorer2 and ConjunctionScorer don't use a similarity implementation. They compute raw scores and nrMatches. ConjunctionScorer is a hybrid. It uses coord-factors and is is used as NrMatchersScorer. This could lead to incorrect results with Similarity implementations other than DefaultSimilarity. A ConjunctionScorer used as NrMathesScorer should compute raw scores, if used as standard Scorer it should use coord-factors. How can we achieve this in an elegant way? Christoph
        Hide
        cutting@apache.org added a comment -

        Yes, copyright must be assigned to the Apache Software Foundation.

        Is that okay, Paul?

        Show
        cutting@apache.org added a comment - Yes, copyright must be assigned to the Apache Software Foundation. Is that okay, Paul?
        Hide
        Paul Elschot added a comment -

        (In reply to comment #10)
        > Hi Paul,
        >
        > I finally found time to look into your code in detail and I think
        > it's really excellent work. Before committing it, I have a few questions.
        >
        > *) In your source files you have included a copyright statement referring
        > to yourself. Of course you include the Apache License. However, I haven't
        seen
        > other source files in Lucene with similar copyright statements. I don't know
        the
        > legal consequences of that. Maybe someone else on the list knows more. The
        > simplest solution would be to substitute "Copyright 2004 Paul Elschot" with
        > "Copyright 2004 The Apache Software Foundation". Would you agree?

        The intention is to allow the Apache Software Foundation to take over the
        copyright in case they want to.
        As I understand the Apache Licence, taking over the copyright is
        allowed by the licence. So I used my own copyright, and it could be changed
        when taken over into an Apache project.
        However, the relevant documentation
        http://apache.org/dev/apply-license.html
        sais that contributed files should have the copyright
        assigned to the Apache Software Foundation.
        I'll try and do that the next time.
        Could you change the copyright notices accordingly this time?

        > *) BooleanScorer2 extends NrMatchersScorer and nrMatchers() always returns 1.
        > Is there a reason for that? I think it should either only extend Scorer or
        > deliver the correct values. I opt for extending Scorer only.

        The reason is that a BooleanQuery can be scored by a few cooperating
        (nested) scorers, and that it should still be possible to compute the
        coordination factor from the number of matching scorers of the originally
        added clauses.

        By default nrMatchers() returns 1, and this is for the case when the scorer is
        given to the BooleanScorer2 as a scorer of an added clause.
        (At the moment these are wrapped in a NrMatchersScorer. )
        The cooperating scorers implementing the boolean behaviour
        add these numbers for their subscorers to make it work in the same way
        as scoring a single BooleanQuery.
        The idea is is to either sum nrMatchers(), or to use nrMatchers()
        for the coordination factor in the score and return 1 for nrMatchers().
        It might be worthwhile to add something like this in the javadocs.

        > *) All NrMatchersScorers except for BooleanScorer2 and ConjunctionScorer
        don't
        > use a similarity implementation. They compute raw scores and nrMatches.
        > ConjunctionScorer is a hybrid. It uses coord-factors and is is used as
        > NrMatchersScorer. This could lead to incorrect results with Similarity
        > implementations other than DefaultSimilarity. A ConjunctionScorer used as
        > NrMathesScorer should compute raw scores, if used as standard Scorer it
        > should use coord-factors. How can we achieve this in an elegant way?
        >
        > Christoph

        Your're right that ConjunctionScorer has a double role here:
        it can be used as a full replacement for BooleanScorer when all clauses
        are required, and it can also be used to score only the required
        clauses combined with ReqOptScorer or ReqExclScorer for the other
        clauses.

        The implementation could only fail when ConjunctionScorer
        provides a nrMatchers bigger than 1, and computes the coordination
        factor into it's score. The implementation prevents
        this by using a top level scorer that always returns 1 for nrMatchers,
        and uses nrMatchers() of it's subscorers for the coordination factor.

        This is somewhat tricky, so I hope I got all the details right.

        It also means that the changed ConjunctionScorer should not multiply
        a coordination factor into its score() value. I don't remember
        whether or not it does that, but it shouldn't.

        One way to solve this would be to use another name for the changed
        ConjunctionScorer, or to explicitly document that it should be
        wrapped in a scorer that returns 1 for nrMatchers() when implementing
        a full BooleanQuery.

        Regards,
        Paul Eschot.

        In case nrMatchers() is added to Scorer, this wrapping would not
        be necessary, and it should be documented that it is expected that
        the scorers for the clauses implement their own coordination factor
        into their score and return 1 for nrMatchers().
        There may be a better way to implement this 'decoupling'
        of the coordination factor from the cooperating scorers enterely within
        BooleanScorer2, for example by maintaining the
        number of matching subscorers in the top level scorer, invisible
        from the outside, and having all the cooperating scorers maintain
        this attribute of the top level scorer instead of their own.

        Show
        Paul Elschot added a comment - (In reply to comment #10) > Hi Paul, > > I finally found time to look into your code in detail and I think > it's really excellent work. Before committing it, I have a few questions. > > *) In your source files you have included a copyright statement referring > to yourself. Of course you include the Apache License. However, I haven't seen > other source files in Lucene with similar copyright statements. I don't know the > legal consequences of that. Maybe someone else on the list knows more. The > simplest solution would be to substitute "Copyright 2004 Paul Elschot" with > "Copyright 2004 The Apache Software Foundation". Would you agree? The intention is to allow the Apache Software Foundation to take over the copyright in case they want to. As I understand the Apache Licence, taking over the copyright is allowed by the licence. So I used my own copyright, and it could be changed when taken over into an Apache project. However, the relevant documentation http://apache.org/dev/apply-license.html sais that contributed files should have the copyright assigned to the Apache Software Foundation. I'll try and do that the next time. Could you change the copyright notices accordingly this time? > *) BooleanScorer2 extends NrMatchersScorer and nrMatchers() always returns 1. > Is there a reason for that? I think it should either only extend Scorer or > deliver the correct values. I opt for extending Scorer only. The reason is that a BooleanQuery can be scored by a few cooperating (nested) scorers, and that it should still be possible to compute the coordination factor from the number of matching scorers of the originally added clauses. By default nrMatchers() returns 1, and this is for the case when the scorer is given to the BooleanScorer2 as a scorer of an added clause. (At the moment these are wrapped in a NrMatchersScorer. ) The cooperating scorers implementing the boolean behaviour add these numbers for their subscorers to make it work in the same way as scoring a single BooleanQuery. The idea is is to either sum nrMatchers(), or to use nrMatchers() for the coordination factor in the score and return 1 for nrMatchers(). It might be worthwhile to add something like this in the javadocs. > *) All NrMatchersScorers except for BooleanScorer2 and ConjunctionScorer don't > use a similarity implementation. They compute raw scores and nrMatches. > ConjunctionScorer is a hybrid. It uses coord-factors and is is used as > NrMatchersScorer. This could lead to incorrect results with Similarity > implementations other than DefaultSimilarity. A ConjunctionScorer used as > NrMathesScorer should compute raw scores, if used as standard Scorer it > should use coord-factors. How can we achieve this in an elegant way? > > Christoph Your're right that ConjunctionScorer has a double role here: it can be used as a full replacement for BooleanScorer when all clauses are required, and it can also be used to score only the required clauses combined with ReqOptScorer or ReqExclScorer for the other clauses. The implementation could only fail when ConjunctionScorer provides a nrMatchers bigger than 1, and computes the coordination factor into it's score. The implementation prevents this by using a top level scorer that always returns 1 for nrMatchers, and uses nrMatchers() of it's subscorers for the coordination factor. This is somewhat tricky, so I hope I got all the details right. It also means that the changed ConjunctionScorer should not multiply a coordination factor into its score() value. I don't remember whether or not it does that, but it shouldn't. One way to solve this would be to use another name for the changed ConjunctionScorer, or to explicitly document that it should be wrapped in a scorer that returns 1 for nrMatchers() when implementing a full BooleanQuery. Regards, Paul Eschot. In case nrMatchers() is added to Scorer, this wrapping would not be necessary, and it should be documented that it is expected that the scorers for the clauses implement their own coordination factor into their score and return 1 for nrMatchers(). There may be a better way to implement this 'decoupling' of the coordination factor from the cooperating scorers enterely within BooleanScorer2, for example by maintaining the number of matching subscorers in the top level scorer, invisible from the outside, and having all the cooperating scorers maintain this attribute of the top level scorer instead of their own.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14067)
        A patch to the current BooleanQuery that forms the "built into"

        Using this patch instead of BooleanQuery.java from attachment 13739
        also incorporates the intermediate javadoc addition to BooleanQuery.

        I think I can find some time in the coming weeks to remove the
        NrMatchersScorer from attachment 13739.
        So, in case this gets into the Lucene dev branch before that,
        please avoid using NrMatchersScorer and the nrMatchers() method
        it provides.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=14067) A patch to the current BooleanQuery that forms the "built into" Using this patch instead of BooleanQuery.java from attachment 13739 also incorporates the intermediate javadoc addition to BooleanQuery. I think I can find some time in the coming weeks to remove the NrMatchersScorer from attachment 13739. So, in case this gets into the Lucene dev branch before that, please avoid using NrMatchersScorer and the nrMatchers() method it provides. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14069)
        Reworked BooleanScorer2 to drop NrMatchersScorer.

        Adding BooleanScorer2 to Lucene, 22 Jan 2005.

        The main difference with the previous patch of 12 Dec 2004
        is that all counting of matching scorers is now done
        locally in BooleanQuery2 using a wrapper and a few
        inline subclasses.

        This allowed to get rid of NrMatchersScorer completely.
        ConjunctionScorer is used as a summing scorer by passing
        it a default similarity.

        The copyright is now assigned to the Apache Software Foundation.

        The implementation of explain() is still in its infancy.
        Some "Expert:" annotations may still be needed in the javadocs.

        The code, all under Apache Licence Version 2:

        Test for BooleanScorer, with some example code
        using both 1.4 scorer and new BooleanScorer2:
        src/test/org/apache/lucene/search/TestBoolean2.java
        (Functionality unchanged from 12 Dec 2004)

        Alternative for BooleanScorer, using the scorers below:
        src/java/org/apache/lucene/search/BooleanScorer2.java
        (Functionality changed heavily, see above).

        Counterpart of ConjunctionScorer, also allowing a
        minimal nr of matchers:
        src/java/org/apache/lucene/search/DisjunctionSumScorer.java
        (Functionality unchanged from 12 Dec 2004, keeping
        the nrMatchers() method but not implementing NrMatchersScorer.)

        Helper for case of only prohibited subscorers:
        src/java/org/apache/lucene/search/NonMatchingScorer.java
        I would suggest to also use this in other places instead of
        a null Scorer, that's why it is in a separate java file.

        For required and prohibited subscorers:
        src/java/org/apache/lucene/search/ReqExclScorer.java
        (Functionality unchanged from 12 Dec 2004, except
        for removing the nrMatchers() method.)

        For required and optional subscorers:
        src/java/org/apache/lucene/search/ReqOptSumScorer.java
        (Functionality unchanged from 12 Dec 2004, except
        for removing the nrMatchers() method. The score()
        method was simplified a bit.)

        Changes to existing code:

        Redirect BooleanQuery to BooleanScorer2 under
        control of static methods setUseScorer14 and getUseScorer14,
        default using BooleanScorer2, as requested by Doug:
        src/java/org/apache/lucene/search/BooleanQuery.java
        (Unchanged from the previous version of 12 Dec 2004)

        Added some helping code for tests by TestBoolean2,
        This uses the setUseScorer14 method to test both versions:
        src/test/org/apache/lucene/search/CheckHits.java
        (Unchanged from the previous version of 12 Dec 2004)

        ConjunctionScorer is no more declared final,
        and the imports are explicit:
        src/java/org/apache/lucene/search/ConjunctionScorer.java

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=14069) Reworked BooleanScorer2 to drop NrMatchersScorer. Adding BooleanScorer2 to Lucene, 22 Jan 2005. The main difference with the previous patch of 12 Dec 2004 is that all counting of matching scorers is now done locally in BooleanQuery2 using a wrapper and a few inline subclasses. This allowed to get rid of NrMatchersScorer completely. ConjunctionScorer is used as a summing scorer by passing it a default similarity. The copyright is now assigned to the Apache Software Foundation. The implementation of explain() is still in its infancy. Some "Expert:" annotations may still be needed in the javadocs. The code, all under Apache Licence Version 2: Test for BooleanScorer, with some example code using both 1.4 scorer and new BooleanScorer2: src/test/org/apache/lucene/search/TestBoolean2.java (Functionality unchanged from 12 Dec 2004) Alternative for BooleanScorer, using the scorers below: src/java/org/apache/lucene/search/BooleanScorer2.java (Functionality changed heavily, see above). Counterpart of ConjunctionScorer, also allowing a minimal nr of matchers: src/java/org/apache/lucene/search/DisjunctionSumScorer.java (Functionality unchanged from 12 Dec 2004, keeping the nrMatchers() method but not implementing NrMatchersScorer.) Helper for case of only prohibited subscorers: src/java/org/apache/lucene/search/NonMatchingScorer.java I would suggest to also use this in other places instead of a null Scorer, that's why it is in a separate java file. For required and prohibited subscorers: src/java/org/apache/lucene/search/ReqExclScorer.java (Functionality unchanged from 12 Dec 2004, except for removing the nrMatchers() method.) For required and optional subscorers: src/java/org/apache/lucene/search/ReqOptSumScorer.java (Functionality unchanged from 12 Dec 2004, except for removing the nrMatchers() method. The score() method was simplified a bit.) Changes to existing code: Redirect BooleanQuery to BooleanScorer2 under control of static methods setUseScorer14 and getUseScorer14, default using BooleanScorer2, as requested by Doug: src/java/org/apache/lucene/search/BooleanQuery.java (Unchanged from the previous version of 12 Dec 2004) Added some helping code for tests by TestBoolean2, This uses the setUseScorer14 method to test both versions: src/test/org/apache/lucene/search/CheckHits.java (Unchanged from the previous version of 12 Dec 2004) ConjunctionScorer is no more declared final, and the imports are explicit: src/java/org/apache/lucene/search/ConjunctionScorer.java Regards, Paul Elschot
        Hide
        Christoph Goller added a comment -

        Eliminating NrMatchersScorer and putting all coord stuff into
        Boolean2Scorer is very elegant. I committed your patch.

        Christoph

        Show
        Christoph Goller added a comment - Eliminating NrMatchersScorer and putting all coord stuff into Boolean2Scorer is very elegant. I committed your patch. Christoph
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14116)
        ReqExclScorer.java simplified

        This obsoletes the recently introduced ReqExclScorer.java .
        It also defines the class package private instead of public.

        Defining the other scorers for BooleanScorer2 also package private
        would be in line with the rest of the package.

        There is one thing left to do in BooleanScorer2: an implementation
        of explain().

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=14116) ReqExclScorer.java simplified This obsoletes the recently introduced ReqExclScorer.java . It also defines the class package private instead of public. Defining the other scorers for BooleanScorer2 also package private would be in line with the rest of the package. There is one thing left to do in BooleanScorer2: an implementation of explain(). Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        I'm trying to get explain() to work in BooleanScorer2 and I stumbled
        on the following.
        In a first attempt to get explain() to work in BooleanScorer2, I need to call
        TermScorer.explain() which gives a smaller result than TermWeight.explain(),
        which also explains the query weight and the idf.

        The private TermWeight class in TermQuery.java has an explain() method
        that calls TermScorer.explain() to explain the term frequency.

        I'd like to move the explain() functionality from TermWeight to TermScorer to
        fix this. However, a similar situation exists for MultiPhraseQuery,
        PhrasePrefixQuery, PhraseQuery and SpanWeight, so that would be a lot of work.

        There are some alternatives:

        Keep the Weight close to each scorer in BooleanScorer2 which is also lot
        of work.

        Use the existing BooleanScorer.explain() also for BooleanScorer2.
        This is the current situation.

        I prefer the last option.

        Regards,
        Paul Elschot.

        Show
        Paul Elschot added a comment - I'm trying to get explain() to work in BooleanScorer2 and I stumbled on the following. In a first attempt to get explain() to work in BooleanScorer2, I need to call TermScorer.explain() which gives a smaller result than TermWeight.explain(), which also explains the query weight and the idf. The private TermWeight class in TermQuery.java has an explain() method that calls TermScorer.explain() to explain the term frequency. I'd like to move the explain() functionality from TermWeight to TermScorer to fix this. However, a similar situation exists for MultiPhraseQuery, PhrasePrefixQuery, PhraseQuery and SpanWeight, so that would be a lot of work. There are some alternatives: Keep the Weight close to each scorer in BooleanScorer2 which is also lot of work. Use the existing BooleanScorer.explain() also for BooleanScorer2. This is the current situation. I prefer the last option. Regards, Paul Elschot.
        Hide
        Paul Elschot added a comment -

        I was about to close this bug, but then I saw that the
        simplified ReqExclScorer.java of Jan 27 is not in the trunk.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - I was about to close this bug, but then I saw that the simplified ReqExclScorer.java of Jan 27 is not in the trunk. Regards, Paul Elschot

          People

          • Assignee:
            Lucene Developers
            Reporter:
            Paul Elschot
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development