Lucene - Core
  1. Lucene - Core
  2. LUCENE-533

SpanQuery scoring: SpanWeight lacks a recursive traversal of the query tree

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.9
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None

      Description

      I found the computing of weights to be somewhat different according to the query type (BooleanQuery versus SpanQuery) :

      org.apache.lucene.search.BooleanQuery.BooleanWeight :

      public BooleanWeight(Searcher searcher)
      throws IOException {
      this.similarity = getSimilarity(searcher);
      for (int i = 0 ; i < clauses.size(); i++)

      { BooleanClause c = (BooleanClause)clauses.elementAt(i); weights.add(c.getQuery().createWeight(searcher)); }

      }

      which looks like a recursive descent through the tree, taking into account the weights of all the nodes, whereas :

      org.apache.lucene.search.spans.SpanWeight :

      public SpanWeight(SpanQuery query, Searcher searcher)
      throws IOException

      { this.similarity = query.getSimilarity(searcher); this.query = query; this.terms = query.getTerms(); idf = this.query.getSimilarity(searcher).idf(terms, searcher); }

      lacks any traversal and according to what I have understood so far from the rest
      of the code, only takes into account the boost of the tree root in SumOfSquareWeights(),
      which is consistent with the resulting scores not considering the boost of the tree
      leaves.

      vintz

        Activity

        Erick Erickson made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Erick Erickson added a comment -

        SPRING_CLEANING_2013 JIRAS. Spans is an abstract class, suspect this is already done.

        Show
        Erick Erickson added a comment - SPRING_CLEANING_2013 JIRAS. Spans is an abstract class, suspect this is already done.
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563310 ] jira [ 12584174 ]
        Mark Thomas made changes -
        Workflow jira [ 12352587 ] Default workflow, editable Closed status [ 12563310 ]
        Hide
        Paul Elschot added a comment -

        I see I missed the introduction of payloads into Spans. As back compat is broken anyway, one might as well get rid of the Spans interface completely and make Spans an abstract class.
        Since it is only the interface that is in the way of changes, any way to get rid of the Spans as an interface is ok with me.

        Payloads can be yet another way to introduce a (term/spans) weight, so one might subclass these from WeightedSpans:
        Spans -> WeightedSpans -> PayloadSpans.
        That would also allow to use WeightedSpans inside an object hierarchy for scoring nested span queries, and to use PayloadSpans as a leafs.

        Scoring nested span queries is not trivial, and allowing a weight on each spans does not make it simpler, but at least it would allow span queries to behave more like boolean queries.

        Show
        Paul Elschot added a comment - I see I missed the introduction of payloads into Spans. As back compat is broken anyway, one might as well get rid of the Spans interface completely and make Spans an abstract class. Since it is only the interface that is in the way of changes, any way to get rid of the Spans as an interface is ok with me. Payloads can be yet another way to introduce a (term/spans) weight, so one might subclass these from WeightedSpans: Spans -> WeightedSpans -> PayloadSpans. That would also allow to use WeightedSpans inside an object hierarchy for scoring nested span queries, and to use PayloadSpans as a leafs. Scoring nested span queries is not trivial, and allowing a weight on each spans does not make it simpler, but at least it would allow span queries to behave more like boolean queries.
        Hide
        Mark Miller added a comment -

        Paul:

        Spans is breaking back compat in this release. This is the opportunity to change anything.

        Do you think we should add something like this? Do you think it might make sense to make Spans abstract?

        Its an interface thats gaining methods this release, so back compat is gone anyway (Spans back compat was lost in the last release, and we are correcting things to a degree in 2.9). If we can help pave the way for the future in a reasonably small amount of time - this is likely the best opportunity.

        Show
        Mark Miller added a comment - Paul: Spans is breaking back compat in this release. This is the opportunity to change anything. Do you think we should add something like this? Do you think it might make sense to make Spans abstract? Its an interface thats gaining methods this release, so back compat is gone anyway (Spans back compat was lost in the last release, and we are correcting things to a degree in 2.9). If we can help pave the way for the future in a reasonably small amount of time - this is likely the best opportunity.
        Hide
        Paul Elschot added a comment -

        One problem here is that the Spans interface does not have a property for a weight value.

        So one way to start this could be to deprecate Spans and to define something like this:

        public abstract class WeightedSpans implements Spans {
          ... abstract methods as in Spans interface;
        
          public float getValue()
          // implement getValue here to allow WeightedSpans to replace Spans everywhere
          { return 1.0; }
        }
        
        Show
        Paul Elschot added a comment - One problem here is that the Spans interface does not have a property for a weight value. So one way to start this could be to deprecate Spans and to define something like this: public abstract class WeightedSpans implements Spans { ... abstract methods as in Spans interface ; public float getValue() // implement getValue here to allow WeightedSpans to replace Spans everywhere { return 1.0; } }
        Hide
        Mark Miller added a comment -

        Hoping we can get this one worked out Vincent. It is odd that the boost are ignored, and certainly seems incorrect to me.

        I worked on a fix a few months ago, but unfortunately, things complicated fairly quickly, and I think you end up needing something similar to what BooleanQuery does to compute the tree boosts, which adds a lot of complication and could be a lot of performance loss (Span queries are already generally slower than standard queries).

        Hope to get this resolved at some point in the future though, just not as simple as I would have hoped. Possibly why it was punted on to begin with.

        Show
        Mark Miller added a comment - Hoping we can get this one worked out Vincent. It is odd that the boost are ignored, and certainly seems incorrect to me. I worked on a fix a few months ago, but unfortunately, things complicated fairly quickly, and I think you end up needing something similar to what BooleanQuery does to compute the tree boosts, which adds a lot of complication and could be a lot of performance loss (Span queries are already generally slower than standard queries). Hope to get this resolved at some point in the future though, just not as simple as I would have hoped. Possibly why it was punted on to begin with.
        Michael Busch made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Minor [ 4 ]
        Vincent Le Maout created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Vincent Le Maout
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development