Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7643

Move IndexOrDocValuesQuery to queries (or core?)

    Details

    • Type: Task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.5
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was just doing some benchmarking to check that IndexOrDocValues actually makes things faster when it is supposed to:

                          TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                       Range25       30.27      (0.6%)       29.22      (4.7%)   -3.5% (  -8% -    1%)
                       Range10       66.74      (0.9%)       64.52      (4.2%)   -3.3% (  -8% -    1%)
                        Term35       18.59      (1.6%)       18.16      (1.9%)   -2.3% (  -5% -    1%)
                        Term02      274.98      (1.1%)      269.47      (1.9%)   -2.0% (  -4% -    1%)
              AndTerm35Range10       26.82      (2.5%)       26.50      (2.8%)   -1.2% (  -6% -    4%)
              AndTerm02Range25       56.27      (1.3%)       99.04      (7.9%)   76.0% (  65% -   86%)
      

      In the above results, the number after the query type indicates the percentage of docs in the index that it matches. With the baseline, range queries are simple point range queries, while the patch is an IndexOrDocValuesQuery that wraps both a point range query and a doc values query that matches the same documents. As expected, AndTerm35Range10 performs the same in both cases since the range is supposed to lead the iteration, so the IndexOrDocValuesQuery is rewritten to the wrapped point range query. However with AndTerm02Range25 the range cost is higher than the term cost so the range is only used for verifying matches and the IndexOrDocValuesQuery rewrites to the wrapped doc values query, yielding a speedup since we do not have to evaluate the range against the whole index.

      I think the -2/-3% difference we are seeing for everything else than AndTerm02Range25 is noisy since term queries execute exactly the same way in both cases, yet they have this slight slowdown too.

      I would like to make it easier to use by moving IndexOrDocValuesQuery and DocValuesRangeQuery to a different module than sandbox, and giving the doc values range query an API that is closer to point ranges by making the bounds required (null disallowed) and removing the includeLower and includeUpper parameters. I wanted to move to queries initially but maybe core is better, that way we could link from the point API to IndexOrDocValuesQuery as a way to make queries on fields that have both points and doc values more efficient.

      1. LUCENE-7643.patch
        88 kB
        Adrien Grand

        Issue Links

          Activity

          Hide
          mikemccand Michael McCandless added a comment -

          +1 to promote these queries to core.

          Show
          mikemccand Michael McCandless added a comment - +1 to promote these queries to core.
          Hide
          dsmiley David Smiley added a comment -

          +1

          Show
          dsmiley David Smiley added a comment - +1
          Hide
          jpountz Adrien Grand added a comment -

          Here is a patch. Doc values queries are exposed as factory methods on the *DocValuesField classes, like for points. I also added specialization for the single-value case by unwrapping the singleton whenever possible and improved documentation of IndexOrDocValuesQuery with a usage example. I think it is ready?

          Show
          jpountz Adrien Grand added a comment - Here is a patch. Doc values queries are exposed as factory methods on the *DocValuesField classes, like for points. I also added specialization for the single-value case by unwrapping the singleton whenever possible and improved documentation of IndexOrDocValuesQuery with a usage example. I think it is ready?
          Hide
          mikemccand Michael McCandless added a comment -

          +1

          Show
          mikemccand Michael McCandless added a comment - +1
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 71ca2a84bad2495eff3b0b15dc445f3f013ea4af in lucene-solr's branch refs/heads/master from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71ca2a8 ]

          LUCENE-7643: Move IndexOrDocValuesQuery to core.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 71ca2a84bad2495eff3b0b15dc445f3f013ea4af in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71ca2a8 ] LUCENE-7643 : Move IndexOrDocValuesQuery to core.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f57e0177ffd3f367de81bdf7f2ad67ad0f94264a in lucene-solr's branch refs/heads/master from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f57e017 ]

          LUCENE-7643: Fix leftover.

          Show
          jira-bot ASF subversion and git services added a comment - Commit f57e0177ffd3f367de81bdf7f2ad67ad0f94264a in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f57e017 ] LUCENE-7643 : Fix leftover.
          Hide
          dsmiley David Smiley added a comment -

          Nice; I also like the addition of the query methods to the field type. Does DocValuesRangeQuery need to be public now?

          What's up with this change in PointRangeQuery?

          public Scorer get(boolean randomAccess) throws IOException {
          -              if (values.getDocCount() == reader.maxDoc()
          +              if (false && values.getDocCount() == reader.maxDoc()
                             && values.getDocCount() == values.size()
          
          Show
          dsmiley David Smiley added a comment - Nice; I also like the addition of the query methods to the field type. Does DocValuesRangeQuery need to be public now? What's up with this change in PointRangeQuery? public Scorer get( boolean randomAccess) throws IOException { - if (values.getDocCount() == reader.maxDoc() + if ( false && values.getDocCount() == reader.maxDoc() && values.getDocCount() == values.size()
          Hide
          jpountz Adrien Grand added a comment - - edited

          This change in PointRangeQuery was unintended, I just removed it.

          Does DocValuesRangeQuery need to be public now?

          Not sure what you mean since this class has been removed. Its new counterparts are indeed package-private but I don't think they need to be public, do they?

          Show
          jpountz Adrien Grand added a comment - - edited This change in PointRangeQuery was unintended, I just removed it. Does DocValuesRangeQuery need to be public now? Not sure what you mean since this class has been removed. Its new counterparts are indeed package-private but I don't think they need to be public, do they?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 20b7dfae42810ea4c345355735d732bdbb191150 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=20b7dfa ]

          LUCENE-7643: Move IndexOrDocValuesQuery to core.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 20b7dfae42810ea4c345355735d732bdbb191150 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=20b7dfa ] LUCENE-7643 : Move IndexOrDocValuesQuery to core.
          Hide
          dsmiley David Smiley added a comment -

          Its new counterparts are indeed package-private

          Oh right; that's all I meant.

          Thanks Adrien.

          Show
          dsmiley David Smiley added a comment - Its new counterparts are indeed package-private Oh right; that's all I meant. Thanks Adrien.
          Hide
          hossman Hoss Man added a comment -

          Something about this change appears to have introduced an NPE risk that one of Solr's randomized tests caught (see SOLR-10013 for full details)...

             [junit4]    > Throwable #1: java.lang.RuntimeException: Exception during query
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([690818771545E96F:51983624D9EDF0F4]:0)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:821)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:788)
             [junit4]    > 	at org.apache.solr.schema.DocValuesTest.testFloatAndDoubleRangeQueryRandom(DocValuesTest.java:618)
          ...
             [junit4]    > Caused by: java.lang.NullPointerException
             [junit4]    > 	at org.apache.lucene.document.SortedNumericDocValuesRangeQuery$1$1.matches(SortedNumericDocValuesRangeQuery.java:114)
             [junit4]    > 	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:253)
             [junit4]    > 	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
             [junit4]    > 	at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669)
          ...
          
          Show
          hossman Hoss Man added a comment - Something about this change appears to have introduced an NPE risk that one of Solr's randomized tests caught (see SOLR-10013 for full details)... [junit4] > Throwable #1: java.lang.RuntimeException: Exception during query [junit4] > at __randomizedtesting.SeedInfo.seed([690818771545E96F:51983624D9EDF0F4]:0) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:821) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:788) [junit4] > at org.apache.solr.schema.DocValuesTest.testFloatAndDoubleRangeQueryRandom(DocValuesTest.java:618) ... [junit4] > Caused by: java.lang.NullPointerException [junit4] > at org.apache.lucene.document.SortedNumericDocValuesRangeQuery$1$1.matches(SortedNumericDocValuesRangeQuery.java:114) [junit4] > at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:253) [junit4] > at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197) [junit4] > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669) ...
          Hide
          jpountz Adrien Grand added a comment -

          Woops sorry for that. I will not be able to look into it before Monday so feel free to either revert the change or mute the test until then and I will have a look.

          Show
          jpountz Adrien Grand added a comment - Woops sorry for that. I will not be able to look into it before Monday so feel free to either revert the change or mute the test until then and I will have a look.
          Hide
          hossman Hoss Man added a comment -

          Adrien Grand: pretty sure the problem is somewhere in this optimization you introduced...

          ...I also added specialization for the single-value case by unwrapping the singleton whenever possible...

          If i force singleton to always be null (in SortedNumericDocValuesRangeQuery) the seed passes.

          I'm thinking rather then reverting your entire commit (on trunk and 6x) I'll just commit a small chnge to remove this optinization (from both SortedNumericDocValuesRangeQuery and SortedSetDocValuesRangeQuery) ... that way we can at least let jenkins keep hammering on the rest of your changes. and you can decide later if the optimization can be fixed.

          I'm currently running all tests with that change .. once that's done i'll verify that the non-randomized lucene tstt steve just added to SOLR-10013 also pases with the optimization disabled, and commit.

          if anyone would prefer i roll back completley, please spak up.

          Show
          hossman Hoss Man added a comment - Adrien Grand : pretty sure the problem is somewhere in this optimization you introduced... ...I also added specialization for the single-value case by unwrapping the singleton whenever possible... If i force singleton to always be null (in SortedNumericDocValuesRangeQuery) the seed passes. I'm thinking rather then reverting your entire commit (on trunk and 6x) I'll just commit a small chnge to remove this optinization (from both SortedNumericDocValuesRangeQuery and SortedSetDocValuesRangeQuery) ... that way we can at least let jenkins keep hammering on the rest of your changes. and you can decide later if the optimization can be fixed. I'm currently running all tests with that change .. once that's done i'll verify that the non-randomized lucene tstt steve just added to SOLR-10013 also pases with the optimization disabled, and commit. if anyone would prefer i roll back completley, please spak up.
          Hide
          hossman Hoss Man added a comment -

          spun off the optimization into LUCENE-7649

          Show
          hossman Hoss Man added a comment - spun off the optimization into LUCENE-7649
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c9262602f06d3fdaa2ec8809a6948aaed72bc0b1 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c926260 ]

          SOLR-10013: Fix DV range query bug introduced by LUCENE-7643 by disabling and optimization (LUCENE-7649 to track re-enabling or removing completely)

          Conflicts:
          lucene/core/src/java/org/apache/lucene/document/SortedNumericDocValuesRangeQuery.java
          lucene/core/src/java/org/apache/lucene/document/SortedSetDocValuesRangeQuery.java

          Show
          jira-bot ASF subversion and git services added a comment - Commit c9262602f06d3fdaa2ec8809a6948aaed72bc0b1 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c926260 ] SOLR-10013 : Fix DV range query bug introduced by LUCENE-7643 by disabling and optimization ( LUCENE-7649 to track re-enabling or removing completely) Conflicts: lucene/core/src/java/org/apache/lucene/document/SortedNumericDocValuesRangeQuery.java lucene/core/src/java/org/apache/lucene/document/SortedSetDocValuesRangeQuery.java
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b0db06bad568b7eedf528379a2fe5ac935992d56 in lucene-solr's branch refs/heads/master from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0db06b ]

          SOLR-10013: Fix DV range query bug introduced by LUCENE-7643 by disabling and optimization (LUCENE-7649 to track re-enabling or removing completely)

          Show
          jira-bot ASF subversion and git services added a comment - Commit b0db06bad568b7eedf528379a2fe5ac935992d56 in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0db06b ] SOLR-10013 : Fix DV range query bug introduced by LUCENE-7643 by disabling and optimization ( LUCENE-7649 to track re-enabling or removing completely)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a5b5df419c7f5bc1a94bc2fa0c1b8ba87b8159f8 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a5b5df4 ]

          LUCENE-7643,SOLR-10013: Reenable the single-value optimization.

          Show
          jira-bot ASF subversion and git services added a comment - Commit a5b5df419c7f5bc1a94bc2fa0c1b8ba87b8159f8 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a5b5df4 ] LUCENE-7643 , SOLR-10013 : Reenable the single-value optimization.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6693c261e5782bc49dea92002745a91215c4166e in lucene-solr's branch refs/heads/master from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6693c26 ]

          LUCENE-7643,SOLR-10013: Reenable the single-value optimization.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6693c261e5782bc49dea92002745a91215c4166e in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6693c26 ] LUCENE-7643 , SOLR-10013 : Reenable the single-value optimization.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 71ca2a84bad2495eff3b0b15dc445f3f013ea4af in lucene-solr's branch refs/heads/apiv2 from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71ca2a8 ]

          LUCENE-7643: Move IndexOrDocValuesQuery to core.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 71ca2a84bad2495eff3b0b15dc445f3f013ea4af in lucene-solr's branch refs/heads/apiv2 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71ca2a8 ] LUCENE-7643 : Move IndexOrDocValuesQuery to core.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f57e0177ffd3f367de81bdf7f2ad67ad0f94264a in lucene-solr's branch refs/heads/apiv2 from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f57e017 ]

          LUCENE-7643: Fix leftover.

          Show
          jira-bot ASF subversion and git services added a comment - Commit f57e0177ffd3f367de81bdf7f2ad67ad0f94264a in lucene-solr's branch refs/heads/apiv2 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f57e017 ] LUCENE-7643 : Fix leftover.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b0db06bad568b7eedf528379a2fe5ac935992d56 in lucene-solr's branch refs/heads/apiv2 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0db06b ]

          SOLR-10013: Fix DV range query bug introduced by LUCENE-7643 by disabling and optimization (LUCENE-7649 to track re-enabling or removing completely)

          Show
          jira-bot ASF subversion and git services added a comment - Commit b0db06bad568b7eedf528379a2fe5ac935992d56 in lucene-solr's branch refs/heads/apiv2 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0db06b ] SOLR-10013 : Fix DV range query bug introduced by LUCENE-7643 by disabling and optimization ( LUCENE-7649 to track re-enabling or removing completely)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6693c261e5782bc49dea92002745a91215c4166e in lucene-solr's branch refs/heads/apiv2 from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6693c26 ]

          LUCENE-7643,SOLR-10013: Reenable the single-value optimization.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6693c261e5782bc49dea92002745a91215c4166e in lucene-solr's branch refs/heads/apiv2 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6693c26 ] LUCENE-7643 , SOLR-10013 : Reenable the single-value optimization.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a36ebaa90c95d8be6411464c237593a1ff825af0 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a36ebaa ]

          LUCENE-7643,SOLR-10013: Reenable the single-value optimization for sorted dv too.

          Show
          jira-bot ASF subversion and git services added a comment - Commit a36ebaa90c95d8be6411464c237593a1ff825af0 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a36ebaa ] LUCENE-7643 , SOLR-10013 : Reenable the single-value optimization for sorted dv too.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0215c65ac56a1faef100caf3eafb6fd85eaa337d in lucene-solr's branch refs/heads/master from Adrien Grand
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0215c65 ]

          LUCENE-7643,SOLR-10013: Reenable the single-value optimization for sorted dv too.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0215c65ac56a1faef100caf3eafb6fd85eaa337d in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0215c65 ] LUCENE-7643 , SOLR-10013 : Reenable the single-value optimization for sorted dv too.

            People

            • Assignee:
              Unassigned
              Reporter:
              jpountz Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development