Lucene - Core
  1. Lucene - Core
  2. LUCENE-1599

SpanRegexQuery and SpanNearQuery is not working with MultiSearcher

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.1
    • Fix Version/s: 2.9
    • Component/s: modules/other
    • Labels:
      None
    • Environment:

      lucene-core 2.4.1, lucene-regex 2.4.1

    • Lucene Fields:
      New, Patch Available

      Description

      MultiSearcher is using:
      queries[i] = searchables[i].rewrite(original);
      to rewrite query and then use combine to combine them.

      But SpanRegexQuery's rewrite is different from others.
      After you call it on the same query, it always return the same rewritten queries.

      As a result, only search on the first IndexSearcher work. All others are using the first IndexSearcher's rewrite queries.
      So many terms are missing and return unexpected result.

      Billow

      1. LUCENE-1599.patch
        8 kB
        Mark Miller
      2. TestSpanRegexBug.java
        3 kB
        Billow Gao

        Activity

        Hide
        Mark Miller added a comment -

        Could you write a quick test case? I just give it a simple go, and for the case I used, I didn't see the problem. If you can provide some sample code that shows exactly what tickles it, I'll work on fixing it.

        Show
        Mark Miller added a comment - Could you write a quick test case? I just give it a simple go, and for the case I used, I didn't see the problem. If you can provide some sample code that shows exactly what tickles it, I'll work on fixing it.
        Hide
        Mark Miller added a comment -

        Hey Billow, do you have anything to add to help us track this down? If not, I'm going to close it as incomplete eventually.

        Show
        Mark Miller added a comment - Hey Billow, do you have anything to add to help us track this down? If not, I'm going to close it as incomplete eventually.
        Hide
        Billow Gao added a comment -

        Sorry, I should write the test case earlier.
        Forgot to check my email on this account.

        Here is the testcase.

        The problem is very clear.
        When there is a SpanNearQuery query which have SpanRegexQuery in it.
        And MultiSearcher is used, the query will only be rewritten once on the first IndexSearcher in the MultiSearcher.
        So only terms in the first IndexSearcher are used.
        And of course, it will fail on other IndexSearchers since it didn't use terms in those IndexSearchers.

        Billow

        Show
        Billow Gao added a comment - Sorry, I should write the test case earlier. Forgot to check my email on this account. Here is the testcase. The problem is very clear. When there is a SpanNearQuery query which have SpanRegexQuery in it. And MultiSearcher is used, the query will only be rewritten once on the first IndexSearcher in the MultiSearcher. So only terms in the first IndexSearcher are used. And of course, it will fail on other IndexSearchers since it didn't use terms in those IndexSearchers. Billow
        Hide
        Billow Gao added a comment -

        Just uploaded the testcase TestSpanRegexBug.java.

        Sorry, I should write the test case earlier.
        Forgot to check my email on this account.

        Here is the testcase.

        The problem is very clear.
        When there is a SpanNearQuery query which have SpanRegexQuery in it.
        And MultiSearcher is used, the query will only be rewritten once on the first IndexSearcher in the MultiSearcher.
        So only terms in the first IndexSearcher are used.
        And of course, it will fail on other IndexSearchers since it didn't use terms in those IndexSearchers.

        Billow

        Show
        Billow Gao added a comment - Just uploaded the testcase TestSpanRegexBug.java. Sorry, I should write the test case earlier. Forgot to check my email on this account. Here is the testcase. The problem is very clear. When there is a SpanNearQuery query which have SpanRegexQuery in it. And MultiSearcher is used, the query will only be rewritten once on the first IndexSearcher in the MultiSearcher. So only terms in the first IndexSearcher are used. And of course, it will fail on other IndexSearchers since it didn't use terms in those IndexSearchers. Billow
        Hide
        Michael McCandless added a comment -

        MultiSearcher seems to rewrite against all its searchers, and then uses Query.combine to merge all the the results? Maybe the issue is in Query.combine?

        Seems like we should fix this for 2.9.

        Show
        Michael McCandless added a comment - MultiSearcher seems to rewrite against all its searchers, and then uses Query.combine to merge all the the results? Maybe the issue is in Query.combine? Seems like we should fix this for 2.9.
        Hide
        Billow Gao added a comment -

        Not there.
        If you use the query one those two IndexSearcher one by one, then you will find that only the first search has a match.
        The second one always fail.

        For example:

        SpanRegexQuery srq = new SpanRegexQuery(new Term("field", "a.*"));
        SpanRegexQuery stq = new SpanRegexQuery(new Term("field", "b.*"));
        SpanNearQuery query = new SpanNearQuery(new SpanQuery[]

        { srq, stq }

        ,
        6, true);
        IndexSearcher[] arrSearcher = new IndexSearcher[2];
        arrSearcher[0] = new IndexSearcher(indexStoreA);
        arrSearcher[1] = new IndexSearcher(indexStoreB);

        Hits hits = arrSearcher[0].search(query);
        assertEquals(1, hits.length());

        hits = arrSearcher[1].search(query);

        //fail here because query rewrite will not use new term.
        //The problem should sit inside spanquery, maybe it only call rewrite function once.
        //Anyway, I didn't check its source code yet.
        assertEquals(1, hits.length());

        arrSearcher[0].close();
        arrSearcher[1].close();

        Show
        Billow Gao added a comment - Not there. If you use the query one those two IndexSearcher one by one, then you will find that only the first search has a match. The second one always fail. For example: SpanRegexQuery srq = new SpanRegexQuery(new Term("field", "a.*")); SpanRegexQuery stq = new SpanRegexQuery(new Term("field", "b.*")); SpanNearQuery query = new SpanNearQuery(new SpanQuery[] { srq, stq } , 6, true); IndexSearcher[] arrSearcher = new IndexSearcher [2] ; arrSearcher [0] = new IndexSearcher(indexStoreA); arrSearcher [1] = new IndexSearcher(indexStoreB); Hits hits = arrSearcher [0] .search(query); assertEquals(1, hits.length()); hits = arrSearcher [1] .search(query); //fail here because query rewrite will not use new term. //The problem should sit inside spanquery, maybe it only call rewrite function once. //Anyway, I didn't check its source code yet. assertEquals(1, hits.length()); arrSearcher [0] .close(); arrSearcher [1] .close();
        Hide
        Michael McCandless added a comment -

        Hmm... do you have a patch in mind (to Lucene's sources) to fix this?

        Show
        Michael McCandless added a comment - Hmm... do you have a patch in mind (to Lucene's sources) to fix this?
        Hide
        Billow Gao added a comment -

        I rewrote the MultiSearcher for my system. Actually, we rewrote most lucene analyzer/parser/search functions...
        So my patch won't work for others.

        I will look into the SpanQuery function when I get a chance. I believe that our patch should be applied there.
        Also, I don't like MultiSearcher's combine function.

        Show
        Billow Gao added a comment - I rewrote the MultiSearcher for my system. Actually, we rewrote most lucene analyzer/parser/search functions... So my patch won't work for others. I will look into the SpanQuery function when I get a chance. I believe that our patch should be applied there. Also, I don't like MultiSearcher's combine function.
        Hide
        Michael McCandless added a comment -

        I rewrote the MultiSearcher for my system. Actually, we rewrote most lucene analyzer/parser/search functions...

        Sounds interesting! Anything that could be contributed back?

        Also, I don't like MultiSearcher's combine function.

        I don't either! It seems like Lucene should rewrite then search, against each searcher, rather than rewriting all up front and combining.

        Show
        Michael McCandless added a comment - I rewrote the MultiSearcher for my system. Actually, we rewrote most lucene analyzer/parser/search functions... Sounds interesting! Anything that could be contributed back? Also, I don't like MultiSearcher's combine function. I don't either! It seems like Lucene should rewrite then search, against each searcher, rather than rewriting all up front and combining.
        Hide
        Billow Gao added a comment -

        The system is specially designed for special requirement. So they are not for general purpose.
        Added many limitation and extended some search features.
        I will re-organize and extract those useful for public, and then contribute them.
        But not a good time to do that now. Will wait until they are stable enough.

        Show
        Billow Gao added a comment - The system is specially designed for special requirement. So they are not for general purpose. Added many limitation and extended some search features. I will re-organize and extract those useful for public, and then contribute them. But not a good time to do that now. Will wait until they are stable enough.
        Hide
        Mark Miller added a comment -

        Something is modifying the original query itself.

        In MultiSearcher.rewrite:

        public Query rewrite(Query original) throws IOException {
        Query[] queries = new Query[searchables.length];
        for (int i = 0; i < searchables.length; i++)

        { queries[i] = searchables[i].rewrite(original); }

        return queries[0].combine(queries);
        }

        On the first time through the loop, the SpanRegexQuery will contain the regex pattern, but the first time it hits rewrite, it will be changed to the expanded query. This shouldnt happen.
        On the next time through the loop, original query will not contain a regex pattern, but will instead be the first time through the loop's rewritten query. Oddness.

        I'll dig in and try and fix for 2.9.

        Show
        Mark Miller added a comment - Something is modifying the original query itself. In MultiSearcher.rewrite: public Query rewrite(Query original) throws IOException { Query[] queries = new Query [searchables.length] ; for (int i = 0; i < searchables.length; i++) { queries[i] = searchables[i].rewrite(original); } return queries [0] .combine(queries); } On the first time through the loop, the SpanRegexQuery will contain the regex pattern, but the first time it hits rewrite, it will be changed to the expanded query. This shouldnt happen. On the next time through the loop, original query will not contain a regex pattern, but will instead be the first time through the loop's rewritten query. Oddness. I'll dig in and try and fix for 2.9.
        Hide
        Mark Miller added a comment -

        Well yuck.

        SpanNearQuery does this clone call in its rewrite method but there is no clone impl - so it looks like it returns a SpanNearQuery with the same clauses instance. So it looks like this gets tangled up with the real query, and the real query gets modified to the rewritten form for the rewrite on searchable2.

        I think anyway. I wanted to just test a fix to if that was right, but SpanNearQuery can contain any span queries, so I guess all of them might need clone impls and we may have to clone the whole chain?

        A little tired to think about it at the moment Looks like the issue is with the cloning in SpanNearQuery though.

        Show
        Mark Miller added a comment - Well yuck. SpanNearQuery does this clone call in its rewrite method but there is no clone impl - so it looks like it returns a SpanNearQuery with the same clauses instance. So it looks like this gets tangled up with the real query, and the real query gets modified to the rewritten form for the rewrite on searchable2. I think anyway. I wanted to just test a fix to if that was right, but SpanNearQuery can contain any span queries, so I guess all of them might need clone impls and we may have to clone the whole chain? A little tired to think about it at the moment Looks like the issue is with the cloning in SpanNearQuery though.
        Hide
        Mark Miller added a comment -

        Test + Fix (properly implement clone on near,or,not span queries)

        Show
        Mark Miller added a comment - Test + Fix (properly implement clone on near,or,not span queries)

          People

          • Assignee:
            Mark Miller
            Reporter:
            Billow Gao
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development