Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      • fix false positive match for SortedSetDV
      • make children iterator instead of bitset.
        see the comment
      1. LUCENE-7871.patch
        44 kB
        Mikhail Khludnev
      2. LUCENE-7871.patch
        23 kB
        Mikhail Khludnev
      3. LUCENE-7871.patch
        19 kB
        Mikhail Khludnev
      4. LUCENE-7871.patch
        15 kB
        Mikhail Khludnev

        Issue Links

          Activity

          Hide
          mkhludnev Mikhail Khludnev added a comment -

          I started LUCENE-7835 from extracting common logic between numDV and sortedDV LUCENE-7871.patch (it's a little bit harish), and faced false positive for sortedDV. I'll try incorporate Adrien Grand clue regarding ConjunctionDISI

          Show
          mkhludnev Mikhail Khludnev added a comment - I started LUCENE-7835 from extracting common logic between numDV and sortedDV LUCENE-7871.patch (it's a little bit harish), and faced false positive for sortedDV. I'll try incorporate Adrien Grand clue regarding ConjunctionDISI
          Hide
          jpountz Adrien Grand added a comment -

          I don't think extracting common logic helps in that case since it needs to introduce a new abstraction.

          Show
          jpountz Adrien Grand added a comment - I don't think extracting common logic helps in that case since it needs to introduce a new abstraction.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Adrien Grand, here is LUCENE-7871.patch what I have so far. ToParentIterator is the new abstraction based on ConjunctionDISI. btw, is ConjunctionDISI steady? Can we assert that ConjunctionDISI.docID is always the same as docVals subiterator docID? I just worry that's not true for Disjunction's one (obviously).

          Show
          mkhludnev Mikhail Khludnev added a comment - Adrien Grand , here is LUCENE-7871.patch what I have so far. ToParentIterator is the new abstraction based on ConjunctionDISI . btw, is ConjunctionDISI steady? Can we assert that ConjunctionDISI.docID is always the same as docVals subiterator docID? I just worry that's not true for Disjunction's one (obviously).
          Hide
          jpountz Adrien Grand added a comment -

          Correct, ConjunctionDISI is always on the same doc ID as its subs. Can you move ToParentIterator to the join package and make it pkg-private? I think you did things this way in order to be able to extend DocValuesIterator but actually you can just extend DocIdSetIterator and define an additional advanceExact method?

          Show
          jpountz Adrien Grand added a comment - Correct, ConjunctionDISI is always on the same doc ID as its subs. Can you move ToParentIterator to the join package and make it pkg-private? I think you did things this way in order to be able to extend DocValuesIterator but actually you can just extend DocIdSetIterator and define an additional advanceExact method?
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Can you move ToParentIterator to the join package and make it pkg-private?

          Thanks it went really well. LUCENE-7871.patch I also moved some its' guts inside.
          Now BlockJoinSelector's methods accept DocIdSetIterator children, keeping all BitSet children-accepting methods deprecated.

          The question is, what we allow ToParentBlockJoinSortField to accept as children? Is it Query? or we have some other abstract per-segment DISI <<factory>>?

          Show
          mkhludnev Mikhail Khludnev added a comment - Can you move ToParentIterator to the join package and make it pkg-private? Thanks it went really well. LUCENE-7871.patch I also moved some its' guts inside. Now BlockJoinSelector 's methods accept DocIdSetIterator children , keeping all BitSet children -accepting methods deprecated. The question is, what we allow ToParentBlockJoinSortField to accept as children? Is it Query ? or we have some other abstract per-segment DISI <<factory>>?
          Hide
          jpountz Adrien Grand added a comment -

          Since the patch is for 7.0 only, it is fine to break backward compatibility, so let's just remove the methods that take a bitset instead of deprecating them? hasValue and seen seem to store the same information, don't they? Finally I'm still not a fan of the ToParentIterator and Accumulator abstractions. I liked it better before, even if that meant a bit of duplication.

          Show
          jpountz Adrien Grand added a comment - Since the patch is for 7.0 only, it is fine to break backward compatibility, so let's just remove the methods that take a bitset instead of deprecating them? hasValue and seen seem to store the same information, don't they? Finally I'm still not a fan of the ToParentIterator and Accumulator abstractions. I liked it better before, even if that meant a bit of duplication.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          hasValue and seen seem t..

          Ok. Thanks. I've collapsed them.

          I did is non backward compatible due to child Query. But turning child Query to DISI turned out soo hard. I had to reproduce ValueSource.ValueSourceSortField trick with weight and context map. But now ToParentBlockJoinSortField should be rewriten before searching. I find it not really convenient, but looks like it's what ValueSourceSortField users live with, see SolrIndexSearcher.weightSort(Sort) (I know), and TestFunctionQuerySort as well. I wonder if we can do this simpler?

          Finally I'm still not a fan of the

          Thankfully it doesn't sound like veto. Does it? I renamed it to the package level ToParentDocValues and pack both twins (sorted and numerics) into it. So, we can think that internally this code is duplicated.
          I propose this OO-hairish stuff because the current duplicated code introduced the bug, and I'm afraid it's caused exactly by this duplication.

          Show
          mkhludnev Mikhail Khludnev added a comment - hasValue and seen seem t.. Ok. Thanks. I've collapsed them. I did is non backward compatible due to child Query. But turning child Query to DISI turned out soo hard. I had to reproduce ValueSource.ValueSourceSortField trick with weight and context map. But now ToParentBlockJoinSortField should be rewriten before searching. I find it not really convenient, but looks like it's what ValueSourceSortField users live with, see SolrIndexSearcher.weightSort(Sort) (I know), and TestFunctionQuerySort as well. I wonder if we can do this simpler? Finally I'm still not a fan of the Thankfully it doesn't sound like veto. Does it? I renamed it to the package level ToParentDocValues and pack both twins (sorted and numerics) into it. So, we can think that internally this code is duplicated. I propose this OO-hairish stuff because the current duplicated code introduced the bug, and I'm afraid it's caused exactly by this duplication.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Adrien Grand, can you give a feedback for API ie for supplying children as a Query?

          Show
          mkhludnev Mikhail Khludnev added a comment - Adrien Grand , can you give a feedback for API ie for supplying children as a Query ?

            People

            • Assignee:
              Unassigned
              Reporter:
              mkhludnev Mikhail Khludnev
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development