Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7871

false positive match BlockJoinSelector[SortedDV] when child value is absent

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, master (8.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      • fix false positive match for SortedSetDV
      • make children an iterator instead of bitset.
        see the comment
      1. LUCENE-7871.patch
        27 kB
        Mikhail Khludnev
      2. LUCENE-7871.patch
        44 kB
        Mikhail Khludnev
      3. LUCENE-7871.patch
        23 kB
        Mikhail Khludnev
      4. LUCENE-7871.patch
        19 kB
        Mikhail Khludnev
      5. LUCENE-7871.patch
        15 kB
        Mikhail Khludnev

        Issue Links

          Activity

          Hide
          mkhludnev Mikhail Khludnev added a comment -

          follow up SOLR-11006 for refactoring children to DISI.

          Show
          mkhludnev Mikhail Khludnev added a comment - follow up SOLR-11006 for refactoring children to DISI.
          Hide
          jpountz Adrien Grand added a comment -
          Show
          jpountz Adrien Grand added a comment - Thanks Mikhail Khludnev !
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bb2d6c128ff74d6164c5c60ac952074c1b5a5b94 in lucene-solr's branch refs/heads/master from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb2d6c1 ]

          LUCENE-7871: fixing CHANGES.txt, mark it as Lucene 7.0 bug fix.

          Show
          jira-bot ASF subversion and git services added a comment - Commit bb2d6c128ff74d6164c5c60ac952074c1b5a5b94 in lucene-solr's branch refs/heads/master from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb2d6c1 ] LUCENE-7871 : fixing CHANGES.txt, mark it as Lucene 7.0 bug fix.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 706d2018153d907642c77ae3673b99142b0d1734 in lucene-solr's branch refs/heads/master from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=706d201 ]

          LUCENE-7871: fix false positive match in BlockJoinSelector when children have no value.

          • introducing BlockJoinSelector.wrap methods accepting children as DISI.
          • extracting ToParentDocValues
          Show
          jira-bot ASF subversion and git services added a comment - Commit 706d2018153d907642c77ae3673b99142b0d1734 in lucene-solr's branch refs/heads/master from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=706d201 ] LUCENE-7871 : fix false positive match in BlockJoinSelector when children have no value. introducing BlockJoinSelector.wrap methods accepting children as DISI. extracting ToParentDocValues
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          ..going, going

          Show
          mkhludnev Mikhail Khludnev added a comment - ..going, going
          Hide
          mkhludnev Mikhail Khludnev added a comment - - edited

          LUCENE-7871.patch

          • fixes TestBlockJoinSelector.java for case when SortedDV has no vals for kids.
          • extracts ToParentDocValues into hairish OO piece
          • introduces BlockJoinSelector.wrap() acepting childred as DISI, no usage yet, existing are deprecated.
            Is there any veto, request to hold on? or it's fine to push it forward?
          Show
          mkhludnev Mikhail Khludnev added a comment - - edited LUCENE-7871.patch fixes TestBlockJoinSelector.java for case when SortedDV has no vals for kids. extracts ToParentDocValues into hairish OO piece introduces BlockJoinSelector.wrap() acepting childred as DISI, no usage yet, existing are deprecated. Is there any veto, request to hold on? or it's fine to push it forward?
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Adrien Grand, can you give a feedback for API ie for supplying children as a Query?

          Show
          mkhludnev Mikhail Khludnev added a comment - Adrien Grand , can you give a feedback for API ie for supplying children as a Query ?
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          hasValue and seen seem t..

          Ok. Thanks. I've collapsed them.

          I did is non backward compatible due to child Query. But turning child Query to DISI turned out soo hard. I had to reproduce ValueSource.ValueSourceSortField trick with weight and context map. But now ToParentBlockJoinSortField should be rewriten before searching. I find it not really convenient, but looks like it's what ValueSourceSortField users live with, see SolrIndexSearcher.weightSort(Sort) (I know), and TestFunctionQuerySort as well. I wonder if we can do this simpler?

          Finally I'm still not a fan of the

          Thankfully it doesn't sound like veto. Does it? I renamed it to the package level ToParentDocValues and pack both twins (sorted and numerics) into it. So, we can think that internally this code is duplicated.
          I propose this OO-hairish stuff because the current duplicated code introduced the bug, and I'm afraid it's caused exactly by this duplication.

          Show
          mkhludnev Mikhail Khludnev added a comment - hasValue and seen seem t.. Ok. Thanks. I've collapsed them. I did is non backward compatible due to child Query. But turning child Query to DISI turned out soo hard. I had to reproduce ValueSource.ValueSourceSortField trick with weight and context map. But now ToParentBlockJoinSortField should be rewriten before searching. I find it not really convenient, but looks like it's what ValueSourceSortField users live with, see SolrIndexSearcher.weightSort(Sort) (I know), and TestFunctionQuerySort as well. I wonder if we can do this simpler? Finally I'm still not a fan of the Thankfully it doesn't sound like veto. Does it? I renamed it to the package level ToParentDocValues and pack both twins (sorted and numerics) into it. So, we can think that internally this code is duplicated. I propose this OO-hairish stuff because the current duplicated code introduced the bug, and I'm afraid it's caused exactly by this duplication.
          Hide
          jpountz Adrien Grand added a comment -

          Since the patch is for 7.0 only, it is fine to break backward compatibility, so let's just remove the methods that take a bitset instead of deprecating them? hasValue and seen seem to store the same information, don't they? Finally I'm still not a fan of the ToParentIterator and Accumulator abstractions. I liked it better before, even if that meant a bit of duplication.

          Show
          jpountz Adrien Grand added a comment - Since the patch is for 7.0 only, it is fine to break backward compatibility, so let's just remove the methods that take a bitset instead of deprecating them? hasValue and seen seem to store the same information, don't they? Finally I'm still not a fan of the ToParentIterator and Accumulator abstractions. I liked it better before, even if that meant a bit of duplication.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Can you move ToParentIterator to the join package and make it pkg-private?

          Thanks it went really well. LUCENE-7871.patch I also moved some its' guts inside.
          Now BlockJoinSelector's methods accept DocIdSetIterator children, keeping all BitSet children-accepting methods deprecated.

          The question is, what we allow ToParentBlockJoinSortField to accept as children? Is it Query? or we have some other abstract per-segment DISI <<factory>>?

          Show
          mkhludnev Mikhail Khludnev added a comment - Can you move ToParentIterator to the join package and make it pkg-private? Thanks it went really well. LUCENE-7871.patch I also moved some its' guts inside. Now BlockJoinSelector 's methods accept DocIdSetIterator children , keeping all BitSet children -accepting methods deprecated. The question is, what we allow ToParentBlockJoinSortField to accept as children? Is it Query ? or we have some other abstract per-segment DISI <<factory>>?
          Hide
          jpountz Adrien Grand added a comment -

          Correct, ConjunctionDISI is always on the same doc ID as its subs. Can you move ToParentIterator to the join package and make it pkg-private? I think you did things this way in order to be able to extend DocValuesIterator but actually you can just extend DocIdSetIterator and define an additional advanceExact method?

          Show
          jpountz Adrien Grand added a comment - Correct, ConjunctionDISI is always on the same doc ID as its subs. Can you move ToParentIterator to the join package and make it pkg-private? I think you did things this way in order to be able to extend DocValuesIterator but actually you can just extend DocIdSetIterator and define an additional advanceExact method?
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Adrien Grand, here is LUCENE-7871.patch what I have so far. ToParentIterator is the new abstraction based on ConjunctionDISI. btw, is ConjunctionDISI steady? Can we assert that ConjunctionDISI.docID is always the same as docVals subiterator docID? I just worry that's not true for Disjunction's one (obviously).

          Show
          mkhludnev Mikhail Khludnev added a comment - Adrien Grand , here is LUCENE-7871.patch what I have so far. ToParentIterator is the new abstraction based on ConjunctionDISI . btw, is ConjunctionDISI steady? Can we assert that ConjunctionDISI.docID is always the same as docVals subiterator docID? I just worry that's not true for Disjunction's one (obviously).
          Hide
          jpountz Adrien Grand added a comment -

          I don't think extracting common logic helps in that case since it needs to introduce a new abstraction.

          Show
          jpountz Adrien Grand added a comment - I don't think extracting common logic helps in that case since it needs to introduce a new abstraction.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          I started LUCENE-7835 from extracting common logic between numDV and sortedDV LUCENE-7871.patch (it's a little bit harish), and faced false positive for sortedDV. I'll try incorporate Adrien Grand clue regarding ConjunctionDISI

          Show
          mkhludnev Mikhail Khludnev added a comment - I started LUCENE-7835 from extracting common logic between numDV and sortedDV LUCENE-7871.patch (it's a little bit harish), and faced false positive for sortedDV. I'll try incorporate Adrien Grand clue regarding ConjunctionDISI

            People

            • Assignee:
              Unassigned
              Reporter:
              mkhludnev Mikhail Khludnev
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development