Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7810

false positive equality: distinctly diff join queries return equals()==true

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.6, 6.7
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      While working on SOLR-10583 I was getting some odd test failures that seemed to suggest we were getting false cache hits for Join queries that should have been unique.

      tracing thorugh the code, the problem seems to be the way TermsQuery implements equals(Object). This class takes in the fromQuery (used to identify set of documents we "join from") and uses it in the equals calculation – but the information about the join field is never passed directly to TermsQuery and the BytesRefs that are passed in can't be compared efficiently (AFAICT), so 2 completely diff calls to JoinUtils.createJoinQuery(...) can result in Query objects that think they are equal() even when they most certainly are not.

      At a brief glance, it appears that similar bugs exist in TermsIncludingScoreQuery (and possibly GlobalOrdinalsWithScoreQuery, but i didn't look into that class at all)

      1. LUCENE_7810.patch
        34 kB
        Martijn van Groningen
      2. LUCENE_7810.patch
        27 kB
        Martijn van Groningen
      3. LUCENE-7810.patch
        4 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          hossman Hoss Man added a comment -

          The attached patch includes a trivial test case demonstraing the bug I found – but this only executes one code path of JoinUtils.createJoinQuery(...) – there are almost certainly other code paths with similar bugs that should also be tested. (see nocommit comments in test)

          A few NOTEs:

          1. At present, due to some code I don't really understand in how Solr only leverages JoinUtils in rewritten queries, it appears that this bug does not impact current Solr usecases. The patch also includes a trivial test showing that the "wrapper" queries solr creates around JoinUtils don't have this same equality problem
          2. i discovered this bug purely by fluke because in my originally POC code for SOLR-10583 I used JoinUtils.createJoinQuery(...) directly instead of refactoring Solr's JoinQParserPlugin code so i could re-use it – doing that refactoring is my nextstep for that issue, so I won't personally be pursuing fixing this bug (particularly since i'm not really sure what a good fix should look like)
          Show
          hossman Hoss Man added a comment - The attached patch includes a trivial test case demonstraing the bug I found – but this only executes one code path of JoinUtils.createJoinQuery(...) – there are almost certainly other code paths with similar bugs that should also be tested. (see nocommit comments in test) A few NOTEs: At present, due to some code I don't really understand in how Solr only leverages JoinUtils in rewritten queries, it appears that this bug does not impact current Solr usecases. The patch also includes a trivial test showing that the "wrapper" queries solr creates around JoinUtils don't have this same equality problem i discovered this bug purely by fluke because in my originally POC code for SOLR-10583 I used JoinUtils.createJoinQuery(...) directly instead of refactoring Solr's JoinQParserPlugin code so i could re-use it – doing that refactoring is my nextstep for that issue, so I won't personally be pursuing fixing this bug (particularly since i'm not really sure what a good fix should look like)
          Hide
          hossman Hoss Man added a comment -

          updated summary that i realize now was value/missleading about the "false positive" situation.

          Show
          hossman Hoss Man added a comment - updated summary that i realize now was value/missleading about the "false positive" situation.
          Hide
          jpountz Adrien Grand added a comment -

          +1 this is a bug. Actually the join query does not even matter for equality since it does not impact the matching hits. This query should only check the field (through its parent MultiTermQuery) and the list of the terms to match.

          I can add it to my TODO list if nobody else has cycles.

          Show
          jpountz Adrien Grand added a comment - +1 this is a bug. Actually the join query does not even matter for equality since it does not impact the matching hits. This query should only check the field (through its parent MultiTermQuery ) and the list of the terms to match. I can add it to my TODO list if nobody else has cycles.
          Hide
          hossman Hoss Man added a comment -

          Thanks for sanity checking me ... i was worried i must have been overlooking something obvious in the intended usage of JoinUtils.

          +1 this is a bug. Actually the join query does not even matter for equality ... This query should only check ... list of the terms to match.

          you mean the from query, correct?

          FWIW: I gather the reason that was used at all is because there's no easy easy/efficient way to do an equality check directly on the BytesRefHash containing the terms

          Show
          hossman Hoss Man added a comment - Thanks for sanity checking me ... i was worried i must have been overlooking something obvious in the intended usage of JoinUtils. +1 this is a bug. Actually the join query does not even matter for equality ... This query should only check ... list of the terms to match. you mean the from query, correct? FWIW: I gather the reason that was used at all is because there's no easy easy/efficient way to do an equality check directly on the BytesRefHash containing the terms
          Hide
          martijn.v.groningen Martijn van Groningen added a comment -

          Good catch Hoss Man! If nobody is working on this then I can fix this bug. So far only the TermsQuery seems to not take into account the join field.

          you mean the from query, correct?

          I think this is what Adrien Grand means, because equality checking the collected terms would be too expensive.

          I think the TermsIncludingScoreQuery, TermsQuery, PointInSetIncludingScoreQuery and PointInSetQuery should also take index reader context id into like the GlobalOrdinalsQuery is doing. Otherwise the fromQuery + join field key can still be invalid. (mainly when docs on the from side added)

          Show
          martijn.v.groningen Martijn van Groningen added a comment - Good catch Hoss Man ! If nobody is working on this then I can fix this bug. So far only the TermsQuery seems to not take into account the join field. you mean the from query, correct? I think this is what Adrien Grand means, because equality checking the collected terms would be too expensive. I think the TermsIncludingScoreQuery , TermsQuery , PointInSetIncludingScoreQuery and PointInSetQuery should also take index reader context id into like the GlobalOrdinalsQuery is doing. Otherwise the fromQuery + join field key can still be invalid. (mainly when docs on the from side added)
          Hide
          martijn.v.groningen Martijn van Groningen added a comment -

          Added patch based on Hoss Man's patch that adds more tests and fixes equals() method in TermsIncludingScoreQuery and TermsQuery. The global ordinal based queries did already implement equals() correctly and the numeric join's equals() method was also working correctly because it is comparing the actual collected points.

          Show
          martijn.v.groningen Martijn van Groningen added a comment - Added patch based on Hoss Man 's patch that adds more tests and fixes equals() method in TermsIncludingScoreQuery and TermsQuery . The global ordinal based queries did already implement equals() correctly and the numeric join's equals() method was also working correctly because it is comparing the actual collected points.
          Hide
          jpountz Adrien Grand added a comment -

          I actually meant the BytesRefHash in my previous comment as comparing only the from query could still lead to false positives on different index readers that have segments in common. However Martijn's idea to take the index reader context identifier into account in equals/hashCode is better I think as it would make comparisons faster.

          These queries should also take the score mode into account for equals/hashCode I think? I read your comment about the fact that it is not needed for query caching, but I think two queries should only be equal if they matche the same docs and give them the same scores. If we want to be able te reuse cache entries that have different score modes, we could rewrite to a TermsQuery in createWeight, similarly to how BooleanQuery rewrites all MUST clauses into FILTER clauses when needsScores is false?

          ScoreMode scoreMode1 = scoreModes.toArray(new ScoreMode[0])[random().nextInt(scoreModes.size())];
          

          I think you could do RandomPicks.randomFrom(random(), scoreModes).

          Otherwise +1

          Show
          jpountz Adrien Grand added a comment - I actually meant the BytesRefHash in my previous comment as comparing only the from query could still lead to false positives on different index readers that have segments in common. However Martijn's idea to take the index reader context identifier into account in equals/hashCode is better I think as it would make comparisons faster. These queries should also take the score mode into account for equals/hashCode I think? I read your comment about the fact that it is not needed for query caching, but I think two queries should only be equal if they matche the same docs and give them the same scores. If we want to be able te reuse cache entries that have different score modes, we could rewrite to a TermsQuery in createWeight, similarly to how BooleanQuery rewrites all MUST clauses into FILTER clauses when needsScores is false? ScoreMode scoreMode1 = scoreModes.toArray( new ScoreMode[0])[random().nextInt(scoreModes.size())]; I think you could do RandomPicks.randomFrom(random(), scoreModes) . Otherwise +1
          Hide
          martijn.v.groningen Martijn van Groningen added a comment -

          > If we want to be able te reuse cache entries that have different score modes, we could rewrite to a TermsQuery in createWeight, similarly to how BooleanQuery rewrites all MUST clauses into FILTER clauses when needsScores is false?

          Good idea. I'll make this change.

          Show
          martijn.v.groningen Martijn van Groningen added a comment - > If we want to be able te reuse cache entries that have different score modes, we could rewrite to a TermsQuery in createWeight, similarly to how BooleanQuery rewrites all MUST clauses into FILTER clauses when needsScores is false? Good idea. I'll make this change.
          Hide
          martijn.v.groningen Martijn van Groningen added a comment -

          Adrien Grand I've updated the patch. Score mode is now taken into account in equals(...) and hashcode(...) methods and in case a scoring query is used when no scores are needed then it the query gets replaced with the non scoring variant.

          Show
          martijn.v.groningen Martijn van Groningen added a comment - Adrien Grand I've updated the patch. Score mode is now taken into account in equals(...) and hashcode(...) methods and in case a scoring query is used when no scores are needed then it the query gets replaced with the non scoring variant.
          Hide
          jpountz Adrien Grand added a comment -

          +1 maybe just replace query.rewrite(IndexReader) with searcher.rewrite(query) when you rewrite in createWeight since the latter guarantees to rewrite recursively until the query is fully rewritten

          Show
          jpountz Adrien Grand added a comment - +1 maybe just replace query.rewrite(IndexReader) with searcher.rewrite(query) when you rewrite in createWeight since the latter guarantees to rewrite recursively until the query is fully rewritten
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 85c1319c7678e9deaccb7b7add50fa3f465bd44f in lucene-solr's branch refs/heads/master from Martijn van Groningen
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85c1319 ]

          LUCENE-7810: Fix equals() and hashCode() methods of several join queries.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 85c1319c7678e9deaccb7b7add50fa3f465bd44f in lucene-solr's branch refs/heads/master from Martijn van Groningen [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85c1319 ] LUCENE-7810 : Fix equals() and hashCode() methods of several join queries.
          Hide
          hossman Hoss Man added a comment -

          Looks like jira-bot missed the 6x backport commit...

          http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/e30fbd68

          Show
          hossman Hoss Man added a comment - Looks like jira-bot missed the 6x backport commit... http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/e30fbd68
          Hide
          jpountz Adrien Grand added a comment -

          Should we backport to branch_6_6 since we are re-spinning?

          Show
          jpountz Adrien Grand added a comment - Should we backport to branch_6_6 since we are re-spinning?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7f62567609e4ffd3181a9c3e713c9736c6604fa6 in lucene-solr's branch refs/heads/branch_6_6 from Martijn van Groningen
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7f62567 ]

          LUCENE-7810: Fix equals() and hashCode() methods of several join queries.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7f62567609e4ffd3181a9c3e713c9736c6604fa6 in lucene-solr's branch refs/heads/branch_6_6 from Martijn van Groningen [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7f62567 ] LUCENE-7810 : Fix equals() and hashCode() methods of several join queries.
          Hide
          martijn.v.groningen Martijn van Groningen added a comment -

          The change has been backported to 6.6 branch too now.

          Show
          martijn.v.groningen Martijn van Groningen added a comment - The change has been backported to 6.6 branch too now.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 65bfa48770b33242d80dcce946073cc5698da7b1 in lucene-solr's branch refs/heads/master from Martijn van Groningen
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65bfa48 ]

          LUCENE-7810: Fix numeric join equals test failure.

          Numeric join equals isn't based on the index reader, but rather on the collected join values.
          In a test failure during the second indexing round no new join values were indexed causing the equals assertion to fail.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 65bfa48770b33242d80dcce946073cc5698da7b1 in lucene-solr's branch refs/heads/master from Martijn van Groningen [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65bfa48 ] LUCENE-7810 : Fix numeric join equals test failure. Numeric join equals isn't based on the index reader, but rather on the collected join values. In a test failure during the second indexing round no new join values were indexed causing the equals assertion to fail.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a4cc08278ca7f707c349c293625b46de2ed94642 in lucene-solr's branch refs/heads/branch_6x from Martijn van Groningen
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a4cc082 ]

          LUCENE-7810: Fix numeric join equals test failure.

          Numeric join equals isn't based on the index reader, but rather on the collected join values.
          In a test failure during the second indexing round no new join values were indexed causing the equals assertion to fail.

          Show
          jira-bot ASF subversion and git services added a comment - Commit a4cc08278ca7f707c349c293625b46de2ed94642 in lucene-solr's branch refs/heads/branch_6x from Martijn van Groningen [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a4cc082 ] LUCENE-7810 : Fix numeric join equals test failure. Numeric join equals isn't based on the index reader, but rather on the collected join values. In a test failure during the second indexing round no new join values were indexed causing the equals assertion to fail.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 87107084c3a90f7ef253c00423b12cc1790f8c2f in lucene-solr's branch refs/heads/branch_6_6 from Martijn van Groningen
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8710708 ]

          LUCENE-7810: Fix numeric join equals test failure.

          Numeric join equals isn't based on the index reader, but rather on the collected join values.
          In a test failure during the second indexing round no new join values were indexed causing the equals assertion to fail.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 87107084c3a90f7ef253c00423b12cc1790f8c2f in lucene-solr's branch refs/heads/branch_6_6 from Martijn van Groningen [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8710708 ] LUCENE-7810 : Fix numeric join equals test failure. Numeric join equals isn't based on the index reader, but rather on the collected join values. In a test failure during the second indexing round no new join values were indexed causing the equals assertion to fail.

            People

            • Assignee:
              Unassigned
              Reporter:
              hossman Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development