Lucene - Core
  1. Lucene - Core
  2. LUCENE-2754

add spanquery support for all multitermqueries

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I set fix version: 4.0, but possibly we could do this for 3.x too

      Currently, we have a special SpanRegexQuery in contrib, and issues like LUCENE-522 open for SpanFuzzyQuery.
      The SpanRegexQuery in contrib is a little messy additionally.

      For any arbitrary MultiTermQueries to work as a SpanQuery, there are only 3 requirements:

      1. The un-rewritten query must extend SpanQuery so it can be included in Span clauses
      2. The rewritten query should be SpanOrQuery instead of BooleanQuery
      3. The rewritten term clauses should be SpanTermQueries.

      Instead of having logic like this for each query, i suggest adding two rewrite methods:

      • ScoringSpanBoolean rewrite
      • TopTermsSpanBoolean rewrite

      as a start i wrote these up, and added a SpanMultiTermQueryWrapper that can be used to wrap any multitermquery this way.
      there are a few kinks, but I think the MTQ policeman can probably help get through them.

      1. LUCENE-2754.patch
        15 kB
        Robert Muir
      2. LUCENE-2754.patch
        24 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1
          Hide
          Uwe Schindler added a comment -

          Committed 3.x revision: 1035118

          Show
          Uwe Schindler added a comment - Committed 3.x revision: 1035118
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1035096

          Now backporting...

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1035096 Now backporting...
          Hide
          Uwe Schindler added a comment -

          The linked LUCENE-2757 will add support for this while also refactoring MTQ.

          Show
          Uwe Schindler added a comment - The linked LUCENE-2757 will add support for this while also refactoring MTQ.
          Hide
          Robert Muir added a comment -

          Attached is an improved patch, still not perfect but better.

          I added Uwe's suggestion of delegating all methods in the wrapper, and also
          setting the rewrite method in the ctor and documenting that it does this.

          I added a simple test case in search.spans that wraps some wildcard and fuzzy queries,
          and added examples and javadocs.

          I also deprecated the SpanRegexQuery, converted the tests for the contrib SpanRegexQuery to use the wrapper directly, and deprecated the old tests, but they still work too.

          I would prefer to avoid the heavy refactoring of MultiTermQuery rewrites (easier on another issue?) just because it would make backporting to 3.x a little more tricky.

          Anyway, I think this is pretty close, maybe just needs another police investigation
          before committing.

          Show
          Robert Muir added a comment - Attached is an improved patch, still not perfect but better. I added Uwe's suggestion of delegating all methods in the wrapper, and also setting the rewrite method in the ctor and documenting that it does this. I added a simple test case in search.spans that wraps some wildcard and fuzzy queries, and added examples and javadocs. I also deprecated the SpanRegexQuery, converted the tests for the contrib SpanRegexQuery to use the wrapper directly, and deprecated the old tests, but they still work too. I would prefer to avoid the heavy refactoring of MultiTermQuery rewrites (easier on another issue?) just because it would make backporting to 3.x a little more tricky. Anyway, I think this is pretty close, maybe just needs another police investigation before committing.
          Hide
          Robert Muir added a comment -

          What about the third BQ rewrite mode (is it needed or is scoring different (we have two rerwite modes for fuzzy)?

          In this case, i thought not because it would be constant score queries, not spanqueries.
          I think a constant score spanquery would be useless

          So the rewrite methods can only be used for real SpanQueries, how about moving those rewrite methods out of MTQ? I know, then we must make some classes more public... But I was already thinking about the code mess in MTQ. Maybe we should remove those modes and put into separate classes at all.

          I agree completely... its tricky to scroll through the massive .java file and find what we want.
          I would also prefer in this case for the Span-rewrite methods to be in the actual o.a.l.search.span package...

          Show
          Robert Muir added a comment - What about the third BQ rewrite mode (is it needed or is scoring different (we have two rerwite modes for fuzzy)? In this case, i thought not because it would be constant score queries, not spanqueries. I think a constant score spanquery would be useless So the rewrite methods can only be used for real SpanQueries, how about moving those rewrite methods out of MTQ? I know, then we must make some classes more public... But I was already thinking about the code mess in MTQ. Maybe we should remove those modes and put into separate classes at all. I agree completely... its tricky to scroll through the massive .java file and find what we want. I would also prefer in this case for the Span-rewrite methods to be in the actual o.a.l.search.span package...
          Hide
          Uwe Schindler added a comment -

          When we respect both things, why do we need the wrapper at all

          Ah because query must extend SpanQuery That solves my problem.

          So the rewrite methods can only be used for real SpanQueries, how about moving those rewrite methods out of MTQ? I know, then we must make some classes more public... But I was already thinking about the code mess in MTQ. Maybe we should remove those modes and put into separate classes at all.

          +1 to backport to 3.x

          Show
          Uwe Schindler added a comment - When we respect both things, why do we need the wrapper at all Ah because query must extend SpanQuery That solves my problem. So the rewrite methods can only be used for real SpanQueries, how about moving those rewrite methods out of MTQ? I know, then we must make some classes more public... But I was already thinking about the code mess in MTQ. Maybe we should remove those modes and put into separate classes at all. +1 to backport to 3.x
          Hide
          Uwe Schindler added a comment -

          What about the third BQ rewrite mode (is it needed or is scoring different (we have two rerwite modes for fuzzy)?

          Show
          Uwe Schindler added a comment - What about the third BQ rewrite mode (is it needed or is scoring different (we have two rerwite modes for fuzzy)?
          Hide
          Uwe Schindler added a comment -

          Looks good, even the generics policeman is happy (with the wrapper which is like MTQWrapperFilter! Phantastic

          About the nocommit and more problems related to this:

          • I would remove separate boosts for wrapped and wrapper query. The wrapper should simply delegate get/setBoost(). We have other examples in Lucene that do the same (somewhere...). So the boost does not need to be set in rewrite() and toString() may not show two different boosts. toString() should not print any boost at all (only the wrapped query will do).
          • The same should be done in the rewrite method, it should simply delegate. In ctor, it could set the rewrite mode initially to a span default. We should only document this.

          When we respect both things, why do we need the wrapper at all? Wrapping and setting a rewrite mode is the same work. And the SpanRegExQuery in contrib could simply set the rewrite mode in the ctor (like Fuzzy or NRQ does) and maybe override toString() for nice output.

          Show
          Uwe Schindler added a comment - Looks good, even the generics policeman is happy (with the wrapper which is like MTQWrapperFilter! Phantastic About the nocommit and more problems related to this: I would remove separate boosts for wrapped and wrapper query. The wrapper should simply delegate get/setBoost(). We have other examples in Lucene that do the same (somewhere...). So the boost does not need to be set in rewrite() and toString() may not show two different boosts. toString() should not print any boost at all (only the wrapped query will do). The same should be done in the rewrite method, it should simply delegate. In ctor, it could set the rewrite mode initially to a span default. We should only document this. When we respect both things, why do we need the wrapper at all? Wrapping and setting a rewrite mode is the same work. And the SpanRegExQuery in contrib could simply set the rewrite mode in the ctor (like Fuzzy or NRQ does) and maybe override toString() for nice output.
          Hide
          Robert Muir added a comment -

          here's an initial patch... all the SpanRegexQuery tests pass, but it could use its own tests too.

          theres a nocommit mentioning what i don't like.

          Show
          Robert Muir added a comment - here's an initial patch... all the SpanRegexQuery tests pass, but it could use its own tests too. theres a nocommit mentioning what i don't like.

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development