Lucene - Core
  1. Lucene - Core
  2. LUCENE-6021

Make FixedBitSet and SparseFixedBitSet share a wider common interface

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today, the only common interfaces that these two classes share are Bits and Accountable. I would like to add a BitSet base class that would be both extended by FixedBitSet and SparseFixedBitSet. The idea is to share more code between these two impls and make them interchangeable for more use-cases so that we could just use one or the other based on the density of the data that we are working on.

      1. LUCENE-6021.patch
        121 kB
        Adrien Grand
      2. LUCENE-6021.patch
        121 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch:

        • oal.util.BitSet has the get/set/clear methods you would expect on a BitSet
        • SparseFixedBitSet and FixedBitSet extend BitSet
        • BitSet also has nextSetBit but the contract of this method has been changed to return 2^31-1 instead of -1 when there are no more bits set, it makes it possible to reuse this method in DocIdSetIterator.advance/nextDoc
        • BitSet also has default implementations for and(DISI), or(DISI) and andNot(DISI) than can be overridden by sub-classes (eg. FixedBitSet does)
        • or/and/andNot have also been added to DocIdSetBuilder
        • BooleanFilter now uses DocIdSetBuilder to build the bitset of matching docs instead of enforcing a FixedBitSet (so it will sometimes use a SparseFixedBitSet when data is sparse)
        Show
        Adrien Grand added a comment - Here is a patch: oal.util.BitSet has the get/set/clear methods you would expect on a BitSet SparseFixedBitSet and FixedBitSet extend BitSet BitSet also has nextSetBit but the contract of this method has been changed to return 2^31-1 instead of -1 when there are no more bits set, it makes it possible to reuse this method in DocIdSetIterator.advance/nextDoc BitSet also has default implementations for and(DISI), or(DISI) and andNot(DISI) than can be overridden by sub-classes (eg. FixedBitSet does) or/and/andNot have also been added to DocIdSetBuilder BooleanFilter now uses DocIdSetBuilder to build the bitset of matching docs instead of enforcing a FixedBitSet (so it will sometimes use a SparseFixedBitSet when data is sparse)
        Hide
        Uwe Schindler added a comment -

        Cool.

        In the patch you have the abstract cardinality() method on BitSet abstract, too. So BitDocIdSet's 2nd ctor does not need to be specialized to FixedBitSet?

        Show
        Uwe Schindler added a comment - Cool. In the patch you have the abstract cardinality() method on BitSet abstract, too. So BitDocIdSet's 2nd ctor does not need to be specialized to FixedBitSet?
        Hide
        Adrien Grand added a comment -

        Good point Uwe. The reason why I did it this way is that for SparseFixedBitSet it makes more sense to use #aproximateCardinality() to get a cost. So I refactored a bit the patch to add an #approximateCardinality() method on BitSet that forwards to #cardinality() for FixedBitSet and is overridden by SparseFixedBitSet to use linear counting. Then BitDocIdSet's 2nd constructor doesn't specialize for FixedBitSet and uses approximateCardinality to get a cost in all cases. Does it make sense to you?

        Show
        Adrien Grand added a comment - Good point Uwe. The reason why I did it this way is that for SparseFixedBitSet it makes more sense to use #aproximateCardinality() to get a cost. So I refactored a bit the patch to add an #approximateCardinality() method on BitSet that forwards to #cardinality() for FixedBitSet and is overridden by SparseFixedBitSet to use linear counting. Then BitDocIdSet's 2nd constructor doesn't specialize for FixedBitSet and uses approximateCardinality to get a cost in all cases. Does it make sense to you?
        Hide
        Uwe Schindler added a comment -

        That is fine to me! I was just wondering why you did this specialization at all. Thanks for the explanation!

        Show
        Uwe Schindler added a comment - That is fine to me! I was just wondering why you did this specialization at all. Thanks for the explanation!
        Hide
        ASF subversion and git services added a comment -

        Commit 1634012 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1634012 ]

        LUCENE-6021: Make SparseFixedBitSet and FixedBitSet share a common "BitSet" interface.

        Show
        ASF subversion and git services added a comment - Commit 1634012 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1634012 ] LUCENE-6021 : Make SparseFixedBitSet and FixedBitSet share a common "BitSet" interface.
        Hide
        ASF subversion and git services added a comment -

        Commit 1634013 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1634013 ]

        LUCENE-6021: Make SparseFixedBitSet and FixedBitSet share a common "BitSet" interface.

        Show
        ASF subversion and git services added a comment - Commit 1634013 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1634013 ] LUCENE-6021 : Make SparseFixedBitSet and FixedBitSet share a common "BitSet" interface.
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development