Lucene - Core
  1. Lucene - Core
  2. LUCENE-323

[PATCH] MultiFieldQueryParser and BooleanQuery do not provide adequate support for queries across multiple fields

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.9
    • Component/s: core/queryparser
    • Labels:
      None
    • Environment:

      Operating System: Windows XP
      Platform: PC

      Description

      The attached test case demonstrates this problem and provides a fix:
      1. Use a custom similarity to eliminate all tf and idf effects, just to
      isolate what is being tested.
      2. Create two documents doc1 and doc2, each with two fields title and
      description. doc1 has "elephant" in title and "elephant" in description.
      doc2 has "elephant" in title and "albino" in description.
      3. Express query for "albino elephant" against both fields.
      Problems:
      a. MultiFieldQueryParser won't recognize either document as containing
      both terms, due to the way it expands the query across fields.
      b. Expressing query as "title:albino description:albino title:elephant
      description:elephant" will score both documents equivalently, since each
      matches two query terms.
      4. Comparison to MaxDisjunctionQuery and my method for expanding queries
      across fields. Using notation that () represents a BooleanQuery and ( | )
      represents a MaxDisjunctionQuery, "albino elephant" expands to:
      ( (title:albino | description:albino)
      (title:elephant | description:elephant) )
      This will recognize that doc2 has both terms matched while doc1 only has 1
      term matched, score doc2 over doc1.

      Refinement note: the actual expansion for "albino query" that I use is:
      ( (title:albino | description:albino)~0.1
      (title:elephant | description:elephant)~0.1 )
      This causes the score of each MaxDisjunctionQuery to be the score of highest
      scoring MDQ subclause plus 0.1 times the sum of the scores of the other MDQ
      subclauses. Thus, doc1 gets some credit for also having "elephant" in the
      description but only 1/10 as much as doc2 gets for covering another query term
      in its description. If doc3 has "elephant" in title and both "albino"
      and "elephant" in the description, then with the actual refined expansion, it
      gets the highest score of all (whereas with pure max, without the 0.1, it
      would get the same score as doc2).

      In real apps, tf's and idf's also come into play of course, but can affect
      these either way (i.e., mitigate this fundamental problem or exacerbate it).

      1. dms.tar.gz
        5 kB
        Chuck Williams
      2. DisjunctionMaxQuery.java
        10 kB
        Yonik Seeley
      3. DisjunctionMaxScorer.java
        7 kB
        Yonik Seeley
      4. TestDisjunctionMaxQuery.java
        14 kB
        Yonik Seeley
      5. TestMaxDisjunctionQuery.java
        14 kB
        Hoss Man
      6. ASF.LICENSE.NOT.GRANTED--WikipediaSimilarity.java
        2 kB
        Chuck Williams
      7. ASF.LICENSE.NOT.GRANTED--WikipediaSimilarity.java
        2 kB
        Chuck Williams
      8. ASF.LICENSE.NOT.GRANTED--WikipediaSimilarity.java
        2 kB
        Chuck Williams
      9. ASF.LICENSE.NOT.GRANTED--TestRanking.zip
        10 kB
        Miles Barr
      10. ASF.LICENSE.NOT.GRANTED--TestRanking.zip
        10 kB
        Chuck Williams
      11. ASF.LICENSE.NOT.GRANTED--TestRanking.zip
        10 kB
        Chuck Williams

        Activity

        Hide
        Chuck Williams added a comment -

        Created an attachment (id=13747)
        Test case to demonstrate problem, and fix to fix it.

        Run TestQuery with Lucene on classpath. All test output to System.out.
        MaxDisjunctionQuery is ready for use (alhthough provides no QueryParser
        integration). DistributedMultiFieldQueryParser is not complete as it contains
        only the functionality I've needed.

        Show
        Chuck Williams added a comment - Created an attachment (id=13747) Test case to demonstrate problem, and fix to fix it. Run TestQuery with Lucene on classpath. All test output to System.out. MaxDisjunctionQuery is ready for use (alhthough provides no QueryParser integration). DistributedMultiFieldQueryParser is not complete as it contains only the functionality I've needed.
        Hide
        Daniel Naber added a comment -

        Unfortunately for this to become part of Lucene the dependencies to Java 1.5
        will need to be removed I think. Anyway, I finally understand the problem now.
        The reason that nobody complained so far might be that people use mostly AND
        queries today (and for cases where AND queries don't make sense, e.g. synonym
        expansion, the current scoring implementation is okay I think).

        Show
        Daniel Naber added a comment - Unfortunately for this to become part of Lucene the dependencies to Java 1.5 will need to be removed I think. Anyway, I finally understand the problem now. The reason that nobody complained so far might be that people use mostly AND queries today (and for cases where AND queries don't make sense, e.g. synonym expansion, the current scoring implementation is okay I think).
        Hide
        Chuck Williams added a comment -

        Oops, forgot I use Java 1.5 and forgot I had improved this class by taking
        advatnage of it. I don't believe the original version of MaxDisjunctionQuery
        that I sent in email sometime ago had this issue, but it seems moot nonetheless
        as Paul Elschot's DisjunctionQuery is going into Lucene and can be used to get
        the same effect. This can be done easily if it is factored such that
        DisjunctionQuery provides a general mechanism for subclasses to initialize
        state, update state, and produce the final score as it combines the subscorers,
        while DisjunctionSumQuery overrides this to implement an optimized version that
        doesn't require the method calls.

        So my recommendation, would be to factor DisjunctionQuery to make overriding for
        different combining operators easy and to include a version of
        DisjunctionMaxQuery that uses this to implement the MaxDisjunctionQuery scoring
        semantics (i.e., combining operator = max plus constant times sum of other
        terms). I'd be happy to write that version of DisjunctionMaxQuery once
        DisjunctionQuery comes out in a released version of Lucene if it's not done for
        the release.

        Show
        Chuck Williams added a comment - Oops, forgot I use Java 1.5 and forgot I had improved this class by taking advatnage of it. I don't believe the original version of MaxDisjunctionQuery that I sent in email sometime ago had this issue, but it seems moot nonetheless as Paul Elschot's DisjunctionQuery is going into Lucene and can be used to get the same effect. This can be done easily if it is factored such that DisjunctionQuery provides a general mechanism for subclasses to initialize state, update state, and produce the final score as it combines the subscorers, while DisjunctionSumQuery overrides this to implement an optimized version that doesn't require the method calls. So my recommendation, would be to factor DisjunctionQuery to make overriding for different combining operators easy and to include a version of DisjunctionMaxQuery that uses this to implement the MaxDisjunctionQuery scoring semantics (i.e., combining operator = max plus constant times sum of other terms). I'd be happy to write that version of DisjunctionMaxQuery once DisjunctionQuery comes out in a released version of Lucene if it's not done for the release.
        Hide
        Chuck Williams added a comment -

        (From update of attachment 13747)
        About to upload a new version that fixes a bug.

        Show
        Chuck Williams added a comment - (From update of attachment 13747) About to upload a new version that fixes a bug.
        Hide
        Chuck Williams added a comment -

        Created an attachment (id=13791)
        TestRanking, with MaxDisjunctionQuery and DistributingMultiFieldQueryParser

        New version of attachment that fixes a bug in MaxDisjunctionScorer.skipTo()
        (didn't want to leave a buggy version posted here). MaxDisjunctionQuery
        requires Java 1.5, although it would be easy to elimiante the dependencies.

        Show
        Chuck Williams added a comment - Created an attachment (id=13791) TestRanking, with MaxDisjunctionQuery and DistributingMultiFieldQueryParser New version of attachment that fixes a bug in MaxDisjunctionScorer.skipTo() (didn't want to leave a buggy version posted here). MaxDisjunctionQuery requires Java 1.5, although it would be easy to elimiante the dependencies.
        Hide
        Miles Barr added a comment -

        Created an attachment (id=13883)
        Previous attachment ported to Java 1.4

        I've modified MaxDisjunctionQuery and MaxDisjunctionScorer so they compile
        against Java 1.4.

        Show
        Miles Barr added a comment - Created an attachment (id=13883) Previous attachment ported to Java 1.4 I've modified MaxDisjunctionQuery and MaxDisjunctionScorer so they compile against Java 1.4.
        Hide
        Chuck Williams added a comment -

        Created an attachment (id=14131)
        Initial Similarity for use with Wikipedia benchmark relevance test

        This is the untuned Similarity to initially try on the Wikipedia relevanc
        benchmark. It assumes there are two fields called "title" and "body" (actually
        own "body" is referenced). It is designed for use with
        DistributingMultiFieldQueryParser using these initial settings:

        private static final String[] DEFAULT_FIELDS =

        {"title", "body"}

        ;
        private static final float[] DEFAULT_BOOSTS =

        {3.0f, 1.0f}

        ;

        DEFAULT_BOOSTS may need tuning as well.

        Show
        Chuck Williams added a comment - Created an attachment (id=14131) Initial Similarity for use with Wikipedia benchmark relevance test This is the untuned Similarity to initially try on the Wikipedia relevanc benchmark. It assumes there are two fields called "title" and "body" (actually own "body" is referenced). It is designed for use with DistributingMultiFieldQueryParser using these initial settings: private static final String[] DEFAULT_FIELDS = {"title", "body"} ; private static final float[] DEFAULT_BOOSTS = {3.0f, 1.0f} ; DEFAULT_BOOSTS may need tuning as well.
        Hide
        Chuck Williams added a comment -

        Created an attachment (id=14132)
        Revised WikipediaSimilarity for Java 1.4

        WikipediaSimilarity revised to not use Math.log10 since it isn't in Java 1.4
        platform. Also generalized to make the log base a tunable parameter.

        Show
        Chuck Williams added a comment - Created an attachment (id=14132) Revised WikipediaSimilarity for Java 1.4 WikipediaSimilarity revised to not use Math.log10 since it isn't in Java 1.4 platform. Also generalized to make the log base a tunable parameter.
        Hide
        Chuck Williams added a comment -

        Created an attachment (id=14136)
        WikipediaSimilarity for Java 1.4 refactored for interactive parameter tuning

        This is the same WikipediaSimilarity as the previous, but parameterized so that
        the tf & idf logarithm bases can be interactively tuned if required.

        Show
        Chuck Williams added a comment - Created an attachment (id=14136) WikipediaSimilarity for Java 1.4 refactored for interactive parameter tuning This is the same WikipediaSimilarity as the previous, but parameterized so that the tf & idf logarithm bases can be interactively tuned if required.
        Hide
        Hoss Man added a comment -

        2cents:

        Even if the DistributingMultiFieldQueryParser isn't deemd "core worthy" (the comments suggest it should be treated more as an example then a complete parser) I definitely think MaxDisjunctionQuery.java should be added to the core. I've been playing with it for a few weeks now in progromaticaly constructed queries, and I love it – I can't remember how i lived with out it.

        Show
        Hoss Man added a comment - 2cents: Even if the DistributingMultiFieldQueryParser isn't deemd "core worthy" (the comments suggest it should be treated more as an example then a complete parser) I definitely think MaxDisjunctionQuery.java should be added to the core. I've been playing with it for a few weeks now in progromaticaly constructed queries, and I love it – I can't remember how i lived with out it.
        Hide
        Chuck Williams added a comment -

        Thanks Hoss, that's nice to hear. I've been out of the community for a while doing other things, but am about to start a large Lucene-based project that I hope will lead to some interesting contributions along the way.

        Show
        Chuck Williams added a comment - Thanks Hoss, that's nice to hear. I've been out of the community for a while doing other things, but am about to start a large Lucene-based project that I hope will lead to some interesting contributions along the way.
        Hide
        Hoss Man added a comment -

        In the interest of encouraging commitment, i've written a UnitTest to demonstrate/prove the expected behavior of a MaxDisjunctionQuery, with and without a tiebreaker, and in combination with a BooleanQuery wrapper.

        Show
        Hoss Man added a comment - In the interest of encouraging commitment, i've written a UnitTest to demonstrate/prove the expected behavior of a MaxDisjunctionQuery, with and without a tiebreaker, and in combination with a BooleanQuery wrapper.
        Hide
        Hoss Man added a comment -

        Be advised: in 1.9, Query.createWeight is declared to throw IOException (it didn't used to in 1.4) so in order for MaxDisjunctionQuery.java to compile against 1.9, the MaxDisjunctionWeight constructor and MaxDisjunctionQuery.createWeight must be declared to throw IOException as well.

        Show
        Hoss Man added a comment - Be advised: in 1.9, Query.createWeight is declared to throw IOException (it didn't used to in 1.4) so in order for MaxDisjunctionQuery.java to compile against 1.9, the MaxDisjunctionWeight constructor and MaxDisjunctionQuery.createWeight must be declared to throw IOException as well.
        Hide
        Yonik Seeley added a comment -

        I'd love to see MaxDisjunctionQuery committed before lucene 1.9 is final.
        I'd vote to commit the current version, as I think Chuck's recommendations would not change the MaxDisjunctionQuery public interface, correct?

        I assume that the DisjunctionQuery that Chuck mentions would actually be DisjunctionScorer or DisjunctionSumScorer?

        QueryParser support can also be handled later.

        Show
        Yonik Seeley added a comment - I'd love to see MaxDisjunctionQuery committed before lucene 1.9 is final. I'd vote to commit the current version, as I think Chuck's recommendations would not change the MaxDisjunctionQuery public interface, correct? I assume that the DisjunctionQuery that Chuck mentions would actually be DisjunctionScorer or DisjunctionSumScorer? QueryParser support can also be handled later.
        Hide
        Paul Elschot added a comment -

        There is an issue with the MaxDisjunctionScorer in the .zip attachment, I'm
        sorry I did not see this earlier when I posted on java-dev about this.

        The problem is that MaxDisjunctionScorer uses bubble sort to keep the subscorer
        sorted over the documents in the next() method (line 103), and this does not scale nicely
        when the number of subscorers increases.
        Supposing the number of subscores that match the document is N,
        the amount of work to be done is proportional to (N*N) per document.
        In DisjunctionSumScorer a priority queue is used, and there the amount of work is
        proportional to (N log(N)) per document.
        So I would recommend to rewrite MaxDisjunctionScorer to inherit from a new common
        super class with DisjunctionSumScorer, sharing everything except the
        advanceAfterCurrent() method (which could be abstract in the new superclass).
        It's possible to be more aggressive in refactoring by initializing and adapting
        the score per index document using different methods, but this would take N
        extra method calls per document.

        At the same time the name could be changed to DisjunctionMaxScorer
        for consistency in the org.lucene.search package.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - There is an issue with the MaxDisjunctionScorer in the .zip attachment, I'm sorry I did not see this earlier when I posted on java-dev about this. The problem is that MaxDisjunctionScorer uses bubble sort to keep the subscorer sorted over the documents in the next() method (line 103), and this does not scale nicely when the number of subscorers increases. Supposing the number of subscores that match the document is N, the amount of work to be done is proportional to (N*N) per document. In DisjunctionSumScorer a priority queue is used, and there the amount of work is proportional to (N log(N)) per document. So I would recommend to rewrite MaxDisjunctionScorer to inherit from a new common super class with DisjunctionSumScorer, sharing everything except the advanceAfterCurrent() method (which could be abstract in the new superclass). It's possible to be more aggressive in refactoring by initializing and adapting the score per index document using different methods, but this would take N extra method calls per document. At the same time the name could be changed to DisjunctionMaxScorer for consistency in the org.lucene.search package. Regards, Paul Elschot
        Hide
        Yonik Seeley added a comment -

        Changes:

        • renamed MaxDisjunction* to DisjunctionMax*
        • added DisjunctionMaxQuery.getClauses()
        • fixed DisjunctionMaxQuery.hashCode() & equals()
        • made DisjunctionMaxScorer package protected (for now at least)
        Show
        Yonik Seeley added a comment - Changes: renamed MaxDisjunction* to DisjunctionMax* added DisjunctionMaxQuery.getClauses() fixed DisjunctionMaxQuery.hashCode() & equals() made DisjunctionMaxScorer package protected (for now at least)
        Hide
        Yonik Seeley added a comment -

        I'd rather have something right now that worked well for a small number clauses, even if it didn't scale to a large number of clauses. All of my usecases consist of small numbers of clauses.

        Since the scorer isn't public, a rewrite can easily be dropped in later when it's done, right?

        For the very common two clause case, will the rewrite you have in mind be as fast as the current version?

        Show
        Yonik Seeley added a comment - I'd rather have something right now that worked well for a small number clauses, even if it didn't scale to a large number of clauses. All of my usecases consist of small numbers of clauses. Since the scorer isn't public, a rewrite can easily be dropped in later when it's done, right? For the very common two clause case, will the rewrite you have in mind be as fast as the current version?
        Hide
        Chuck Williams added a comment -

        The code only uses bubble sort for the incremental resorting of an already-sorted list. The initial sort is done with Arrays.sort() which is O(n*logn). The incremental resort is O(k*n) where k is the number of clauses that match the document last generated. Even if n is large, k will usually be small. Theoretically this is O(n^2) because k could be as high as n, but this is extremely unlikely especially when n is large. More likely is that k is bounded by a small constant, in which case the algorithm is O. It's like Quicksort in that regard – there are outlier cases where it won't perform well, but it will perform better than most alternatives for the vast majority of cases.

        Resorting the whole list every time would perform worse. The best algorithm would probably be to use the standard insert and delete operations on a heap (as in heap sort):

        while top element generated last doc
        heap remove it
        generate it
        heap insert it

        This would yield total time O(k*logn), as with a PriorityQueue.

        I don't think this is much of an issue to worry about, but the algorithm could be revised to use the heap sort operations if others think it is important.

        Chuck

        Show
        Chuck Williams added a comment - The code only uses bubble sort for the incremental resorting of an already-sorted list. The initial sort is done with Arrays.sort() which is O(n*logn). The incremental resort is O(k*n) where k is the number of clauses that match the document last generated. Even if n is large, k will usually be small. Theoretically this is O(n^2) because k could be as high as n, but this is extremely unlikely especially when n is large. More likely is that k is bounded by a small constant, in which case the algorithm is O . It's like Quicksort in that regard – there are outlier cases where it won't perform well, but it will perform better than most alternatives for the vast majority of cases. Resorting the whole list every time would perform worse. The best algorithm would probably be to use the standard insert and delete operations on a heap (as in heap sort): while top element generated last doc heap remove it generate it heap insert it This would yield total time O(k*logn), as with a PriorityQueue. I don't think this is much of an issue to worry about, but the algorithm could be revised to use the heap sort operations if others think it is important. Chuck
        Hide
        Paul Elschot added a comment -

        The ScorerDocQueue.java here has a single operation for something
        very similar to the heap-remove/generate/heap-insert:

        http://issues.apache.org/jira/browse/LUCENE-365

        There is also a test class for testing performance of disjunction scorers
        which could be used to find out which k is big enough to warrant the use
        of a heap (priority queue).

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - The ScorerDocQueue.java here has a single operation for something very similar to the heap-remove/generate/heap-insert: http://issues.apache.org/jira/browse/LUCENE-365 There is also a test class for testing performance of disjunction scorers which could be used to find out which k is big enough to warrant the use of a heap (priority queue). Regards, Paul Elschot
        Hide
        Yonik Seeley added a comment -

        Added Iterable to DisjunctionMaxQuery as a semi Java5 friendly way to iterate over the disjuncts. Added ability to add all disjuncts from an Iterable (Collection, List, another DisjunctionMaxQuery, etc).

        I Committed DisjunctionMaxQuery/Scorer/Test since the Interface should be stable, and the implementation seems to work fine for the common cases. I'll be happy to evaluate & commit performance updates when they become available.

        I'll leave this bug open since it contains multiple issues.

        Show
        Yonik Seeley added a comment - Added Iterable to DisjunctionMaxQuery as a semi Java5 friendly way to iterate over the disjuncts. Added ability to add all disjuncts from an Iterable (Collection, List, another DisjunctionMaxQuery, etc). I Committed DisjunctionMaxQuery/Scorer/Test since the Interface should be stable, and the implementation seems to work fine for the common cases. I'll be happy to evaluate & commit performance updates when they become available. I'll leave this bug open since it contains multiple issues.
        Hide
        Chuck Williams added a comment -

        The attached archive contains a revised DisjunctionMaxScorer that maintains the disjunct scorers as a min heap instead of a sorted list. This reduces the time per next() to O(k*log) instead of O(k*n) per Paul's earlier comment. Most of the class changed, so I included both a patch and the new class. This is only lightly tested; the junit test passes, along with the entire Lucene test suite. I'm not working on the project anymore that led to the original class and so have not tested it on that. I'm working on a new project that will use this and so it will get thoroughly tested there, but am not yet to an appropriate point. I thought it was best to post the patch now as I believe it is correct and the unit test does pass. Perhaps others would like to try it out. E.g., it would be interesting to run the performance test that Paul mentions.

        Also, I found and fixed another bug while updating the class. In the current committed version, there is a problem if skipTo() exhausts all the scorers. It did not set more to false, and so a subsequent call to next() would attempt to access the non-existent first scorer.

        It would be nice to get some form of DistributingMultiFieldQueryParser in so that this is easy to use.

        Thanks to Yonik for committing this functionality!

        Chuck

        Show
        Chuck Williams added a comment - The attached archive contains a revised DisjunctionMaxScorer that maintains the disjunct scorers as a min heap instead of a sorted list. This reduces the time per next() to O(k*log ) instead of O(k*n) per Paul's earlier comment. Most of the class changed, so I included both a patch and the new class. This is only lightly tested; the junit test passes, along with the entire Lucene test suite. I'm not working on the project anymore that led to the original class and so have not tested it on that. I'm working on a new project that will use this and so it will get thoroughly tested there, but am not yet to an appropriate point. I thought it was best to post the patch now as I believe it is correct and the unit test does pass. Perhaps others would like to try it out. E.g., it would be interesting to run the performance test that Paul mentions. Also, I found and fixed another bug while updating the class. In the current committed version, there is a problem if skipTo() exhausts all the scorers. It did not set more to false, and so a subsequent call to next() would attempt to access the non-existent first scorer. It would be nice to get some form of DistributingMultiFieldQueryParser in so that this is easy to use. Thanks to Yonik for committing this functionality! Chuck
        Hide
        Yonik Seeley added a comment -

        Thanks for the changes Chuck!

        Your patch was backwards, BTW

        I haven't had a chance to run any benchmarks, but I committed this because it also fixes a bug.
        Since it also looks like the uses of /2 and *2 were all unsigned, I replaced them with shifts. The multiply doesn't matter much, but IDIV is horribly slow (between 20 and 80 cycles, depending on the arch and operands). Not that I thought it was a bottleneck, but I have problems avoiding that "root of all evil", premature optimization

        Show
        Yonik Seeley added a comment - Thanks for the changes Chuck! Your patch was backwards, BTW I haven't had a chance to run any benchmarks, but I committed this because it also fixes a bug. Since it also looks like the uses of /2 and *2 were all unsigned, I replaced them with shifts. The multiply doesn't matter much, but IDIV is horribly slow (between 20 and 80 cycles, depending on the arch and operands). Not that I thought it was a bottleneck, but I have problems avoiding that "root of all evil", premature optimization
        Hide
        Otis Gospodnetic added a comment -

        Yonik - you committed this? The case is still showing as open, so you may want to close it if you're done with it.

        Show
        Otis Gospodnetic added a comment - Yonik - you committed this? The case is still showing as open, so you may want to close it if you're done with it.
        Hide
        Yonik Seeley added a comment -

        > Yonik - you committed this? The case is still showing as open, so you may want to close it if you're done with it.

        This bug also contains a different Similarity implementation, as well as a DistributingMultiFieldQueryParser. I only committed the DisjunctionMaxQuery part of it and that's why I left it open.

        Show
        Yonik Seeley added a comment - > Yonik - you committed this? The case is still showing as open, so you may want to close it if you're done with it. This bug also contains a different Similarity implementation, as well as a DistributingMultiFieldQueryParser. I only committed the DisjunctionMaxQuery part of it and that's why I left it open.
        Hide
        Chuck Williams added a comment -

        FYI, I've recently noticed the new implementation of MultiFieldQueryParser (new to me since I was out of touch for about 6 months or so). This now does the distribution, so I'm no longer of the opinion that DistributingMultiFieldQueryParser should be committed. A better approach would be to generalize MultiFieldQueryParser to be able use MaxDisjunctionQuery as its container instead of BooleanQuery in the appropriate places. FYI, I'll look at this soon and will submit a revised patch unless there is some reason the details don't sort out. Even it that case, the same approach used in MultiFieldQueryParser of specializing certain methods of QueryParser could be used for DistributingMultiFieldQueryParser, rather than the post parsing traversal/transformation it currently does.

        Show
        Chuck Williams added a comment - FYI, I've recently noticed the new implementation of MultiFieldQueryParser (new to me since I was out of touch for about 6 months or so). This now does the distribution, so I'm no longer of the opinion that DistributingMultiFieldQueryParser should be committed. A better approach would be to generalize MultiFieldQueryParser to be able use MaxDisjunctionQuery as its container instead of BooleanQuery in the appropriate places. FYI, I'll look at this soon and will submit a revised patch unless there is some reason the details don't sort out. Even it that case, the same approach used in MultiFieldQueryParser of specializing certain methods of QueryParser could be used for DistributingMultiFieldQueryParser, rather than the post parsing traversal/transformation it currently does.
        Hide
        Hoss Man added a comment -

        The WIkipediaSimilarity seems to only have been included as an example for the purposes of comparison testing, not as an item to be commited.

        Given Chuck's comment on 21/Dec/05 I'm of the opinion this issue should be closed.

        Show
        Hoss Man added a comment - The WIkipediaSimilarity seems to only have been included as an example for the purposes of comparison testing, not as an item to be commited. Given Chuck's comment on 21/Dec/05 I'm of the opinion this issue should be closed.
        Hide
        Yonik Seeley added a comment -

        OK, closing this bug.
        We can open separate bugs for any alternate Similarity, or any query parser enhancements.

        Show
        Yonik Seeley added a comment - OK, closing this bug. We can open separate bugs for any alternate Similarity, or any query parser enhancements.

          People

          • Assignee:
            Unassigned
            Reporter:
            Chuck Williams
          • Votes:
            4 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development