Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7276

Add an optional reason to the MatchNoDocsQuery

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (7.0), 6.2
    • Component/s: core/search
    • Labels:
    • Lucene Fields:
      New, Patch Available

      Description

      It's sometimes difficult to debug a query that results in a MatchNoDocsQuery. The MatchNoDocsQuery is always rewritten in an empty boolean query.
      This patch adds an optional reason and implements a weight in order to keep track of the reason why the query did not match any document. The reason is printed on toString and when an explanation for noMatch is asked.
      For instance the query:
      new MatchNoDocsQuery("Field not found").toString()
      => 'MatchNoDocsQuery["field 'title' not found"]'

      1. LUCENE-7276.patch
        24 kB
        Michael McCandless
      2. LUCENE-7276.patch
        13 kB
        Michael McCandless
      3. LUCENE-7276.patch
        26 kB
        Michael McCandless
      4. LUCENE-7276.patch
        8 kB
        Michael McCandless
      5. LUCENE-7276.patch
        8 kB
        Michael McCandless
      6. LUCENE-7276.patch
        6 kB
        Jim Ferenczi

        Activity

        Hide
        jim.ferenczi Jim Ferenczi added a comment -

        Patch available.

        Show
        jim.ferenczi Jim Ferenczi added a comment - Patch available.
        Hide
        martijn.v.groningen Martijn van Groningen added a comment -

        +1

        Show
        martijn.v.groningen Martijn van Groningen added a comment - +1
        Hide
        christoph buescher Christoph Buescher added a comment -

        +1

        Show
        christoph buescher Christoph Buescher added a comment - +1
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Jim Ferenczi, this looks great, I'll commit this soon.

        Show
        mikemccand Michael McCandless added a comment - Thanks Jim Ferenczi , this looks great, I'll commit this soon.
        Hide
        mikemccand Michael McCandless added a comment -

        I tweaked the patch a bit (fixed whitespace, fixed some failing test cases), but now I hit this curious failure:

           [junit4]   1> query: (#*:* *:* (-*:*) -(#body:b #(()^4.0)^10.0)^2.0)~1
           [junit4]   1>   after rewrite: (#*:* *:* MatchNoDocsQuery[""] -(#body:b #()^40.0)^2.0)~1
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanRewrites -Dtests.method=testRandom -Dtests.seed=FAE50D783718C6BE -Dtests.locale=sv -Dtests.timezone=Navajo -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] FAILURE 0.33s | TestBooleanRewrites.testRandom <<<
           [junit4]    > Throwable #1: java.lang.AssertionError: doc=0 expected:<0.3535533845424652> but was:<0.5>
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([FAE50D783718C6BE:88A92877867870CD]:0)
           [junit4]    > 	at org.apache.lucene.search.TestBooleanRewrites.assertEquals(TestBooleanRewrites.java:346)
           [junit4]    > 	at org.apache.lucene.search.TestBooleanRewrites.testRandom(TestBooleanRewrites.java:290)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {body=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{}, maxPointsInLeafNode=367, maxMBSortInHeap=6.926568898697452, sim=ClassicSimilarity, locale=sv, timezone=Navajo
           [junit4]   2> NOTE: Linux 3.13.0-87-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=399197264,total=504889344
           [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanRewrites]
        

        Somehow the test is angry that the rewritten query scores differently from the original ... so somehow the fact that we no longer rewrite to an empty BQ is changing something ... I'll dig.

        Show
        mikemccand Michael McCandless added a comment - I tweaked the patch a bit (fixed whitespace, fixed some failing test cases), but now I hit this curious failure: [junit4] 1> query: (#*:* *:* (-*:*) -(#body:b #(()^4.0)^10.0)^2.0)~1 [junit4] 1> after rewrite: (#*:* *:* MatchNoDocsQuery[""] -(#body:b #()^40.0)^2.0)~1 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBooleanRewrites -Dtests.method=testRandom -Dtests.seed=FAE50D783718C6BE -Dtests.locale=sv -Dtests.timezone=Navajo -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.33s | TestBooleanRewrites.testRandom <<< [junit4] > Throwable #1: java.lang.AssertionError: doc=0 expected:<0.3535533845424652> but was:<0.5> [junit4] > at __randomizedtesting.SeedInfo.seed([FAE50D783718C6BE:88A92877867870CD]:0) [junit4] > at org.apache.lucene.search.TestBooleanRewrites.assertEquals(TestBooleanRewrites.java:346) [junit4] > at org.apache.lucene.search.TestBooleanRewrites.testRandom(TestBooleanRewrites.java:290) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62): {body=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{}, maxPointsInLeafNode=367, maxMBSortInHeap=6.926568898697452, sim=ClassicSimilarity, locale=sv, timezone=Navajo [junit4] 2> NOTE: Linux 3.13.0-87-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=399197264,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [TestBooleanRewrites] Somehow the test is angry that the rewritten query scores differently from the original ... so somehow the fact that we no longer rewrite to an empty BQ is changing something ... I'll dig.
        Hide
        mikemccand Michael McCandless added a comment -

        New patch, fixing the test failure by wrapping in a BoostQuery with boost 0.0 in BQ when it rewrites!

        Show
        mikemccand Michael McCandless added a comment - New patch, fixing the test failure by wrapping in a BoostQuery with boost 0.0 in BQ when it rewrites!
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration, fixing query parser test failures. I also fixed BooleanQuery with no clauses to rewrite to MatchNoDocsQuery.

        Who knew so many places cared about MatchNoDocsQuery!

        Show
        mikemccand Michael McCandless added a comment - Another iteration, fixing query parser test failures. I also fixed BooleanQuery with no clauses to rewrite to MatchNoDocsQuery . Who knew so many places cared about MatchNoDocsQuery !
        Hide
        dsmiley David Smiley added a comment -

        Somehow the test is angry that the rewritten query scores differently from the original ... so somehow the fact that we no longer rewrite to an empty BQ is changing something

        New patch, fixing the test failure by wrapping in a BoostQuery with boost 0.0 in BQ when it rewrites!

        Hmm; this feels odd. If it needs to stay then it would be helpful to add some comments where this occurs to explain; I don't really get it. Do you think it might make sense to remove the wrapping BoostQuery for 7.0?

        Thanks. And nice addition Jim!

        Show
        dsmiley David Smiley added a comment - Somehow the test is angry that the rewritten query scores differently from the original ... so somehow the fact that we no longer rewrite to an empty BQ is changing something New patch, fixing the test failure by wrapping in a BoostQuery with boost 0.0 in BQ when it rewrites! Hmm; this feels odd. If it needs to stay then it would be helpful to add some comments where this occurs to explain; I don't really get it. Do you think it might make sense to remove the wrapping BoostQuery for 7.0? Thanks. And nice addition Jim!
        Hide
        mikemccand Michael McCandless added a comment -

        It definitely is odd, and I don't like it, but I don't know how else to make the test happy on this weird corner case.

        I will put a comment attempting to explain the situation.

        Show
        mikemccand Michael McCandless added a comment - It definitely is odd, and I don't like it, but I don't know how else to make the test happy on this weird corner case. I will put a comment attempting to explain the situation.
        Hide
        mikemccand Michael McCandless added a comment -

        I'm going to break this up into separate steps ... the current patch combines too many things: changing MatchNoDocsQuery.toString, adding a reason to it, changing it to return its own Weight instead of rewriting to BQ, changing BQ to rewrite to it when there are no clauses.

        I'll start by fixing MatchNoDocsQuery.toString() to simply return "MatchNoDocsQuery". But even that small change is not easy: it is something of a break in our normal policy for a Query.toString in that today queries try to have a .toString which when sent back through a query parser would parse back to an equivalent query instance. E.g., a PhraseQuery instance's toString would be "foo bar", which most query parsers would parse back into an equivalent query.

        The Python world handles this situation nicely, by having two separate functions: repr makes a string which when eval'd (parsed and executed by Python) turns back into the same object, whereas str makes a pretty thing that humans can understand. But Lucene has no such separation ... and our Query.toString is more like repr now.

        Is anyone besides me worried about this? Do we have other queries that return a "human but not query-parser consumable" string?

        Show
        mikemccand Michael McCandless added a comment - I'm going to break this up into separate steps ... the current patch combines too many things: changing MatchNoDocsQuery.toString , adding a reason to it, changing it to return its own Weight instead of rewriting to BQ, changing BQ to rewrite to it when there are no clauses. I'll start by fixing MatchNoDocsQuery.toString() to simply return "MatchNoDocsQuery". But even that small change is not easy: it is something of a break in our normal policy for a Query.toString in that today queries try to have a .toString which when sent back through a query parser would parse back to an equivalent query instance. E.g., a PhraseQuery instance's toString would be "foo bar" , which most query parsers would parse back into an equivalent query. The Python world handles this situation nicely, by having two separate functions: repr makes a string which when eval'd (parsed and executed by Python) turns back into the same object, whereas str makes a pretty thing that humans can understand. But Lucene has no such separation ... and our Query.toString is more like repr now. Is anyone besides me worried about this? Do we have other queries that return a "human but not query-parser consumable" string?
        Hide
        mikemccand Michael McCandless added a comment -

        OK here's the simple toString change, step 1 of 3 or 4. I think it's ready.

        I fixed some query parser tests to more directly check that they got the MatchNoDocsQuery, or an empty BooleanQuery, instead of relying on toString to do the check.

        Show
        mikemccand Michael McCandless added a comment - OK here's the simple toString change, step 1 of 3 or 4. I think it's ready. I fixed some query parser tests to more directly check that they got the MatchNoDocsQuery , or an empty BooleanQuery , instead of relying on toString to do the check.
        Hide
        dsmiley David Smiley added a comment -

        I can understand if you feel you want to break up the issue for whatever reason, and to be cautious about backwards-compatibility (thus delaying parts of this issue to 7.0?)

        I don't think we should have guarantees that a Query.toString() will be parseable by Lucene's classic QueryParser. That is not the case today for many queries, but for the common core queries, it's probably true. And good point that if some app out there depends on this behavior, we might not want to introduce the change in 6.x of BooleanQuery optimizing to MatchNoDocsQuery.

        Show
        dsmiley David Smiley added a comment - I can understand if you feel you want to break up the issue for whatever reason, and to be cautious about backwards-compatibility (thus delaying parts of this issue to 7.0?) I don't think we should have guarantees that a Query.toString() will be parseable by Lucene's classic QueryParser. That is not the case today for many queries, but for the common core queries, it's probably true. And good point that if some app out there depends on this behavior, we might not want to introduce the change in 6.x of BooleanQuery optimizing to MatchNoDocsQuery.
        Hide
        jpountz Adrien Grand added a comment -

        Is anyone besides me worried about this? Do we have other queries that return a "human but not query-parser consumable" string?

        Even though this is something you should not rely on, I suspect many users do... I think David is right that changing the toString representation should probably be master-only. However, we should still be able to put the reason in the explanation in 6.x?

        Show
        jpountz Adrien Grand added a comment - Is anyone besides me worried about this? Do we have other queries that return a "human but not query-parser consumable" string? Even though this is something you should not rely on, I suspect many users do... I think David is right that changing the toString representation should probably be master-only. However, we should still be able to put the reason in the explanation in 6.x?
        Hide
        jim.ferenczi Jim Ferenczi added a comment - - edited

        Somehow the test is angry that the rewritten query scores differently from the original ... so somehow the fact that we no longer rewrite to an empty BQ is changing something ... I'll dig.

        I tried to find a reason and I think I found something interesting. The change is related to the normalization factor and the fact that those queries are boosted. When you use a boolean query with no clause the normalization factor is 0, when the matchnodocs query is used the normalization factor is 1 (BooleanWeight.getValueForNormalization and ConstantScoreWeight.getValueForNormalization).
        This part of the query is supposed to return no documents so it should be ok to ignore it when the query norm is computed. Though for the distributed case where results are merged from different shards there is no guarantee that the rewrite will be the same among the shards.
        I think we can get rid of the matchnodocsquery vs empty boolean query difference if we change the return value of BooleanWeight.getValueForNormalization to be 1 (instead of 0) when there is no clause.

        https://issues.apache.org/jira/browse/LUCENE-7337

        Show
        jim.ferenczi Jim Ferenczi added a comment - - edited Somehow the test is angry that the rewritten query scores differently from the original ... so somehow the fact that we no longer rewrite to an empty BQ is changing something ... I'll dig. I tried to find a reason and I think I found something interesting. The change is related to the normalization factor and the fact that those queries are boosted. When you use a boolean query with no clause the normalization factor is 0, when the matchnodocs query is used the normalization factor is 1 (BooleanWeight.getValueForNormalization and ConstantScoreWeight.getValueForNormalization). This part of the query is supposed to return no documents so it should be ok to ignore it when the query norm is computed. Though for the distributed case where results are merged from different shards there is no guarantee that the rewrite will be the same among the shards. I think we can get rid of the matchnodocsquery vs empty boolean query difference if we change the return value of BooleanWeight.getValueForNormalization to be 1 (instead of 0) when there is no clause. https://issues.apache.org/jira/browse/LUCENE-7337
        Hide
        mikemccand Michael McCandless added a comment -

        Wow thank you for digging on this Jim Ferenczi ... I've been meaning to get back to this issue, and if we can fix that scoring issue (separately) that will make it much easier.

        Show
        mikemccand Michael McCandless added a comment - Wow thank you for digging on this Jim Ferenczi ... I've been meaning to get back to this issue, and if we can fix that scoring issue (separately) that will make it much easier.
        Hide
        mikemccand Michael McCandless added a comment -

        I don't think we have ever, nor should we ever, make a guarantee that MatchNoDocsQuery.toString would somehow round-trip through a query parser back to itself, and so I think we are free to improve it here/now.

        A right not exercised is soon forgotten.

        Show
        mikemccand Michael McCandless added a comment - I don't think we have ever, nor should we ever, make a guarantee that MatchNoDocsQuery.toString would somehow round-trip through a query parser back to itself, and so I think we are free to improve it here/now. A right not exercised is soon forgotten.
        Hide
        jpountz Adrien Grand added a comment -

        I don't think we have ever, nor should we ever, make a guarantee that MatchNoDocsQuery.toString would somehow round-trip through a query parser back to itself, and so I think we are free to improve it here/now.

        +1

        Show
        jpountz Adrien Grand added a comment - I don't think we have ever, nor should we ever, make a guarantee that MatchNoDocsQuery.toString would somehow round-trip through a query parser back to itself, and so I think we are free to improve it here/now. +1
        Hide
        dsmiley David Smiley added a comment -

        don't think we have ever, nor should we ever, make a guarantee that MatchNoDocsQuery.toString would somehow round-trip through a query parser back to itself, and so I think we are free to improve it here/now.

        Yeah, +1

        Show
        dsmiley David Smiley added a comment - don't think we have ever, nor should we ever, make a guarantee that MatchNoDocsQuery.toString would somehow round-trip through a query parser back to itself, and so I think we are free to improve it here/now. Yeah, +1
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration ... the patch was much simpler now that LUCENE-7337 is done.

        Show
        mikemccand Michael McCandless added a comment - Another iteration ... the patch was much simpler now that LUCENE-7337 is done.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit cbbc505268e8fa994fa21383ed49a91b2e923f66 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cbbc505 ]

        LUCENE-7276: MatchNoDocsQuery now inclues an optional reason for why it was used

        Show
        jira-bot ASF subversion and git services added a comment - Commit cbbc505268e8fa994fa21383ed49a91b2e923f66 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cbbc505 ] LUCENE-7276 : MatchNoDocsQuery now inclues an optional reason for why it was used
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit df2207c5dcf379af25d12ef3b3cd7f44bad4fdff in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df2207c ]

        LUCENE-7276: MatchNoDocsQuery now inclues an optional reason for why it was used

        Show
        jira-bot ASF subversion and git services added a comment - Commit df2207c5dcf379af25d12ef3b3cd7f44bad4fdff in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df2207c ] LUCENE-7276 : MatchNoDocsQuery now inclues an optional reason for why it was used
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Jim Ferenczi!

        Show
        mikemccand Michael McCandless added a comment - Thanks Jim Ferenczi !
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            jim.ferenczi Jim Ferenczi
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development