Lucene - Core
  1. Lucene - Core
  2. LUCENE-6554

ToBlockJoinFieldComparator wrapping is illegal

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The following test case triggers an AssertionError:

        public void testMissingValues() throws IOException {
          final Directory dir = newDirectory();
          final RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig(new MockAnalyzer(random()))
              .setMergePolicy(NoMergePolicy.INSTANCE));
          w.addDocument(new Document());
          w.getReader().close();
          w.addDocument(new Document());
          IndexReader reader = w.getReader();
          w.close();
          IndexSearcher searcher = newSearcher(reader);
          // all docs are parent
          BitDocIdSetFilter parentFilter = new BitDocIdSetCachingWrapperFilter(new QueryWrapperFilter(new MatchAllDocsQuery()));
          BitDocIdSetFilter childFilter = new BitDocIdSetCachingWrapperFilter(new QueryWrapperFilter(new MatchNoDocsQuery()));
          ToParentBlockJoinSortField sortField = new ToParentBlockJoinSortField(
              "some_random_field", SortField.Type.STRING, false, parentFilter, childFilter
          );
          Sort sort = new Sort(sortField);
          TopFieldDocs topDocs = searcher.search(new MatchAllDocsQuery(), 1, sort);
          searcher.getIndexReader().close();
          dir.close();
        }
      
      java.lang.AssertionError
      	at __randomizedtesting.SeedInfo.seed([E9D45D81F597AE4B:83490FC7D11D9ABA]:0)
      	at org.apache.lucene.search.FieldComparator$TermOrdValComparator.setBottom(FieldComparator.java:800)
      	at org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:783)
      	at org.apache.lucene.search.join.ToParentBlockJoinFieldComparator.doSetNextReader(ToParentBlockJoinFieldComparator.java:83)
      	at org.apache.lucene.search.SimpleFieldComparator.getLeafComparator(SimpleFieldComparator.java:36)
      	at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183)
      	at org.apache.lucene.search.TopFieldCollector$NonScoringCollector.getLeafCollector(TopFieldCollector.java:141)
      	at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:40)
      	at org.apache.lucene.search.AssertingCollector.getLeafCollector(AssertingCollector.java:48)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:611)
      	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:424)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:543)
      	at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:528)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:455)
      	at org.apache.lucene.search.join.TestBlockJoinSorting.testMissingValues(TestBlockJoinSorting.java:347)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:483)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
      	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
      	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
      	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
      	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
      	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
      	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
      	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
      	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
      	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
      	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
      	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
      	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
      	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
      	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
      	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
      	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
      	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
      	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
      	at java.lang.Thread.run(Thread.java:745)
      

      The reason is that when a parent document does not have children, ToParentBlockJoinComparator simply omits to forward calls to copy to the wrapped comparator. So the wrapped comparator ends up with allocated slots that have 0 as an ordinal (the default value in an array) and a null value, which is illegal since 0 is a legal ordinal which can't map to null.

      This can't be fixed without adding new methods to the already crazy comparator API, so I think there is nothing we can do but remove this comparator.

      It would be possible to achieve the same functionnality by implementing something similar to SortedNumericSelector, except that it would have to select across several docs instead of values.

      1. LUCENE-6554.patch
        35 kB
        Adrien Grand
      2. LUCENE-6554.patch
        32 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        I tried to think about a replacement for this feature and it would require us to implement a selector for all combinations of dv types and sort modes. Additionally, these selectors would have the downside of being linear with the number of child documents, which I don't think is something we should encourage: it would be more efficient to store aggregated information at the parent level directly.

        So I propose that we just remove the functionnality from the join module.

        Show
        Adrien Grand added a comment - I tried to think about a replacement for this feature and it would require us to implement a selector for all combinations of dv types and sort modes. Additionally, these selectors would have the downside of being linear with the number of child documents, which I don't think is something we should encourage: it would be more efficient to store aggregated information at the parent level directly. So I propose that we just remove the functionnality from the join module.
        Hide
        Martijn van Groningen added a comment -

        I would like this feature to remain. I think implementing this via a selector is okay, its more code, but that shouldn't be too bad. I don't like the workaround, because the aggregated information relies on the child query and to aggregate it properly the child information needs to be aggregated multiple times. I think it is good that someone can choose to avoid duplication of data with block join sorting at the cost that this is linear to the number of matching child documents.

        Show
        Martijn van Groningen added a comment - I would like this feature to remain. I think implementing this via a selector is okay, its more code, but that shouldn't be too bad. I don't like the workaround, because the aggregated information relies on the child query and to aggregate it properly the child information needs to be aggregated multiple times. I think it is good that someone can choose to avoid duplication of data with block join sorting at the cost that this is linear to the number of matching child documents.
        Hide
        Adrien Grand added a comment -

        Here is a patch that removes ToBlockJoinFieldComparator and adds selectors for min/max across blocks of documents that either have SORTED, SORTED_SET, NUMERIC or SORTED_NUMERIC doc values.

        The API of ToParentBlockJoinSortField is left unchanged so it should be pretty transparent to users, the only difference is that it will not work anymore if they used to sort using values (STRING_VAL sort type) or custom comparators.

        Show
        Adrien Grand added a comment - Here is a patch that removes ToBlockJoinFieldComparator and adds selectors for min/max across blocks of documents that either have SORTED, SORTED_SET, NUMERIC or SORTED_NUMERIC doc values. The API of ToParentBlockJoinSortField is left unchanged so it should be pretty transparent to users, the only difference is that it will not work anymore if they used to sort using values (STRING_VAL sort type) or custom comparators.
        Hide
        Martijn van Groningen added a comment -

        +1 this looks good, thanks for doing this Adrien Grand!

        Show
        Martijn van Groningen added a comment - +1 this looks good, thanks for doing this Adrien Grand !
        Hide
        ASF subversion and git services added a comment -

        Commit 1688599 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1688599 ]

        LUCENE-6554: Removed illegal ToBlockJoinFieldComparator in favor of doc values selectors.

        Show
        ASF subversion and git services added a comment - Commit 1688599 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1688599 ] LUCENE-6554 : Removed illegal ToBlockJoinFieldComparator in favor of doc values selectors.
        Hide
        ASF subversion and git services added a comment -

        Commit 1688610 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1688610 ]

        LUCENE-6554: Add missing javadocs.

        Show
        ASF subversion and git services added a comment - Commit 1688610 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1688610 ] LUCENE-6554 : Add missing javadocs.
        Hide
        ASF subversion and git services added a comment -

        Commit 1688611 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1688611 ]

        LUCENE-6554: Removed illegal ToBlockJoinFieldComparator in favor of doc values selectors.

        Show
        ASF subversion and git services added a comment - Commit 1688611 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1688611 ] LUCENE-6554 : Removed illegal ToBlockJoinFieldComparator in favor of doc values selectors.
        Hide
        Adrien Grand added a comment -

        Thanks Martijn!

        Show
        Adrien Grand added a comment - Thanks Martijn!
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development