Lucene - Core
  1. Lucene - Core
  2. LUCENE-6022

DocValuesDocIdSet: check deleted docs before doc values

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When live documents are not null, DocValuesDocIdSet checks if doc values match the document before the live docs. Given that checking if doc values match could involve a heavy computation (eg. geo distance) and that the default codec has live docs in memory but doc values on disk, I think it makes more sense to check live docs first?

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch.

        Show
        Adrien Grand added a comment - Here is a patch.
        Hide
        Ryan Ernst added a comment -

        +1, patch looks good. And I especially like the simplification to nextDoc() to just use advance(doc + 1)...we should do this in more places.

        Show
        Ryan Ernst added a comment - +1, patch looks good. And I especially like the simplification to nextDoc() to just use advance(doc + 1) ...we should do this in more places.
        Hide
        ASF subversion and git services added a comment -

        Commit 1633879 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1633879 ]

        LUCENE-6022: DocValuesDocIdSet checks live docs before doc values.

        Show
        ASF subversion and git services added a comment - Commit 1633879 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1633879 ] LUCENE-6022 : DocValuesDocIdSet checks live docs before doc values.
        Hide
        Adrien Grand added a comment -

        And I especially like the simplification to nextDoc() to just use advance(doc + 1)...we should do this in more places.

        Maybe we should make it the default impl of nextDoc() in DocIdSetIterator (it's an abstract class, not an interface).

        Show
        Adrien Grand added a comment - And I especially like the simplification to nextDoc() to just use advance(doc + 1)...we should do this in more places. Maybe we should make it the default impl of nextDoc() in DocIdSetIterator (it's an abstract class, not an interface).
        Hide
        ASF subversion and git services added a comment -

        Commit 1633883 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1633883 ]

        LUCENE-6022: DocValuesDocIdSet checks live docs before doc values.

        Show
        ASF subversion and git services added a comment - Commit 1633883 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633883 ] LUCENE-6022 : DocValuesDocIdSet checks live docs before doc values.
        Hide
        Adrien Grand added a comment -

        Thanks for the review Ryan!

        Show
        Adrien Grand added a comment - Thanks for the review Ryan!
        Hide
        David Smiley added a comment -

        +1 yeah the patch is good.

        This class got me curious what it's for. Despite its name and what it's javadocs claim, it doesn't appear to have anything to do with DocValues, even if some DocValues code might in turn use it. It looks remarkably like FilteredDocIdSet but instead of wrapping another DocIdSet, it wraps a Bits. I've seem somewhat similar classes actually. I propose that it be renamed to not have DocValues in its name. But to what? BitsFilteredDocIdSet is already taken. It needn't have Bits in the name; that could be just a feature (optional Bits filter).
        gotta go now...

        Show
        David Smiley added a comment - +1 yeah the patch is good. This class got me curious what it's for. Despite its name and what it's javadocs claim, it doesn't appear to have anything to do with DocValues, even if some DocValues code might in turn use it. It looks remarkably like FilteredDocIdSet but instead of wrapping another DocIdSet, it wraps a Bits. I've seem somewhat similar classes actually. I propose that it be renamed to not have DocValues in its name. But to what? BitsFilteredDocIdSet is already taken. It needn't have Bits in the name; that could be just a feature (optional Bits filter). gotta go now...
        Hide
        Adrien Grand added a comment -

        I agree this thing could be merged with FilteredDocIdSet. The only additional thing it has is the optimization when deleted docs are a bit set but I'm wondering if it really helps in practice given that we try to merge more agressively segments that have lots of deleted documents.

        Show
        Adrien Grand added a comment - I agree this thing could be merged with FilteredDocIdSet. The only additional thing it has is the optimization when deleted docs are a bit set but I'm wondering if it really helps in practice given that we try to merge more agressively segments that have lots of deleted documents.
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development