Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-346

[PATCH] disable coord for generated BooleanQueries

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: Linux
      Platform: PC

    • Bugzilla Id:
      33472

      Description

      Here's a patch that disables Similiarty.coord() in the scoring of most
      automatically generated boolean queries. The coord() score factor is
      appropriate when clauses are independently specified by a user, but is usually
      not appropriate when clauses are generated automatically, e.g., by a fuzzy,
      wildcard or range query. Matches on such automatically generated queries are
      currently penalized for not matching all terms.

        Activity

        Hide
        lucenebugs@danielnaber.de Daniel Naber added a comment -

        Seems this has been applied, so I'm closing this bug report.

        Show
        lucenebugs@danielnaber.de Daniel Naber added a comment - Seems this has been applied, so I'm closing this bug report.
        Hide
        paul.elschot@xs4all.nl Paul Elschot added a comment -

        The BooleanScorer2 can be simplified heavily when there is
        no coordination needed. Basically all makeCountingSumScorer..
        methods could be renamed to makeSumScorer.. and the
        private Coordinator class is unnecessary in that case.
        I'll start working on a non coordinating version, perhaps
        as a superclass of BooleanScorer2.
        Meanwhile the 'flat' coord method in Similarity will do fine.

        Regards,
        Paul Elschot

        Show
        paul.elschot@xs4all.nl Paul Elschot added a comment - The BooleanScorer2 can be simplified heavily when there is no coordination needed. Basically all makeCountingSumScorer.. methods could be renamed to makeSumScorer.. and the private Coordinator class is unnecessary in that case. I'll start working on a non coordinating version, perhaps as a superclass of BooleanScorer2. Meanwhile the 'flat' coord method in Similarity will do fine. Regards, Paul Elschot
        Hide
        cutting@apache.org cutting@apache.org added a comment -

        Created an attachment (id=14236)
        new version of patch

        You're right. I guess I was looking at the deprecated parse() method. Sorry.

        I've updated the patch to include MFQP.

        Should we commit something like this?

        Show
        cutting@apache.org cutting@apache.org added a comment - Created an attachment (id=14236) new version of patch You're right. I guess I was looking at the deprecated parse() method. Sorry. I've updated the patch to include MFQP. Should we commit something like this?
        Hide
        daniel.naber@t-online.de Daniel Naber added a comment -

        "(f1:t1 f2:t1) (f1:t2 f2:t2)" – that's what MFQP in SVN does already, unless
        you use the deprecated calls. Or am I missing something?

        Show
        daniel.naber@t-online.de Daniel Naber added a comment - "(f1:t1 f2:t1) (f1:t2 f2:t2)" – that's what MFQP in SVN does already, unless you use the deprecated calls. Or am I missing something?
        Hide
        cutting@apache.org cutting@apache.org added a comment -

        Note that this does not yet fix MultiFieldQueryParser. That should probably be
        modified to generate something like, for the query "t1 t2" over fields f1, and
        f2, a query like:

        (f1:t1 f2:t1) (f1:t2 f2:t2)

        in this case the innner queries should have coord disabled while the outer query
        should not. I think the best approach for MultiFieldQueryParser is to
        re-implement it as a subclass of QueryParser which overrides query creation
        methods. But that's a separate patch...

        Show
        cutting@apache.org cutting@apache.org added a comment - Note that this does not yet fix MultiFieldQueryParser. That should probably be modified to generate something like, for the query "t1 t2" over fields f1, and f2, a query like: (f1:t1 f2:t1) (f1:t2 f2:t2) in this case the innner queries should have coord disabled while the outer query should not. I think the best approach for MultiFieldQueryParser is to re-implement it as a subclass of QueryParser which overrides query creation methods. But that's a separate patch...
        Hide
        cutting@apache.org cutting@apache.org added a comment -

        Created an attachment (id=14232)
        patch to disable coord scoring in most generated boolean queries

        Show
        cutting@apache.org cutting@apache.org added a comment - Created an attachment (id=14232) patch to disable coord scoring in most generated boolean queries

          People

          • Assignee:
            java-dev@lucene.apache.org Lucene Developers
            Reporter:
            cutting Doug Cutting
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development