Lucene - Core
  1. Lucene - Core
  2. LUCENE-6427

BitSet fixes - assert on presence of 'ghost bits' and others

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests:

      • Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here).
        • cardinality, nextSetBit, intersects and others may yield wrong results
        • If ghost bits are present, they may become visible after ensureCapacity is called.
        • The tests deliberately create bitsets with ghost bits, but then do not detect these failures
      • FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use

        Activity

        Hide
        ASF GitHub Bot added a comment -

        GitHub user LucVL opened a pull request:

        https://github.com/apache/lucene-solr/pull/142

        Lucene 6427 bit set fixes

        Pull request for LUCENE-6427

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/LucVL/lucene-solr lucene-6427-BitSetFixes

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/lucene-solr/pull/142.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #142


        commit 957e52debe29ef25cbcd5d260a794d3965e7b717
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-13T17:22:12Z

        TestFixedBitSet: Added testBits2Words with various values, concentrating on boundary cases (like for TestLongBitSet)

        commit 0ad2f508dfefb74c79ed2bf8ea8a8882503bcb1b
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-14T09:52:58Z

        TestLongBitSet.testBits2Words: Show Integer.MAX_VALUE boundary more clearly

        commit 8e569310862bd9512bb5320ff6a2c7e19eeb348b
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-14T10:07:39Z

        Aligned TestFixedBitSet.java and TestLongBitSet.java source files for easier comparison, no real changes.
        Only real change is in TestLongBitSet.testSmall to have the same range as in TestFixedBitSet.testSmall

        commit 4ac986ca0f505984cdae88a248c057f94d0fb9cd
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-14T14:39:24Z

        Removed obsolete mod 64 from FixedBitSet.flip(int)
        Added LongBitSet.flip(long)
        Updated TestLongBitSet accordingly

        commit e67844ddf9a20394ed57209c7d0c333ccd6af9a4
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-14T07:01:41Z

        TestFixedBitSet.prove that tests create illegal bitsets

        commit 06af400f51bf9ccd603d9edc976ae552478f5773
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-14T14:47:30Z

        Add same proofs to TestLongBitSet

        commit e606cd6e73eac7bb7111f44402d87c852141d200
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-10T16:26:53Z

        Added verifyGhostBitsClear method that is called in an assert in ensureCapacity to avoid errors in later calls.
        Marked methods that are prone to such errors in comments
        Aligned source code of FixedBitSet and LongBitSet for easier side-by-side comparison
        Test now fail on this verifyGhostBitsClear assert

        commit 1ed5f16bb892bc19d588d995e759f123d9d267c7
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-15T08:12:32Z

        TestFixedBitSet: added more tests to demonstrate spurious failures when ghost bits are present
        Temporarily disabled assert in FixedBitSet constructor

        commit 7ca7c6566fbcc9c841339fd1ec2209ceead6aa9f
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-14T15:39:00Z

        makeFixedBitSet and makeLongBitSet should not create ghost bits
        Tests pass

        commit d696ebc11c7db43fac0d2a93db10fc50d69e32e9
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-15T10:04:31Z

        Merge branch 'trunk' into BitSetFixes-GhostBits

        commit e7bab7c6890928227fe18e783ec07489ba272623
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-15T13:36:58Z

        Performance: use numWords instead of bits.length in e.g. cardinality()
        Consistency: Use numBits instead of length()

        commit 8d0753cb1ba07e67d518d0037f27fba0bfc8dfb4
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-15T13:50:38Z

        Provide a 'slow' isEmpty() implementation that is still faster than any external one could be (e.g.: nextSetBit(0) != -1 for LongBitSet)
        Called it checkIfEmpty() to emphasize the cost of calling it.


        Show
        ASF GitHub Bot added a comment - GitHub user LucVL opened a pull request: https://github.com/apache/lucene-solr/pull/142 Lucene 6427 bit set fixes Pull request for LUCENE-6427 You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucVL/lucene-solr lucene-6427-BitSetFixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #142 commit 957e52debe29ef25cbcd5d260a794d3965e7b717 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-13T17:22:12Z TestFixedBitSet: Added testBits2Words with various values, concentrating on boundary cases (like for TestLongBitSet) commit 0ad2f508dfefb74c79ed2bf8ea8a8882503bcb1b Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-14T09:52:58Z TestLongBitSet.testBits2Words: Show Integer.MAX_VALUE boundary more clearly commit 8e569310862bd9512bb5320ff6a2c7e19eeb348b Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-14T10:07:39Z Aligned TestFixedBitSet.java and TestLongBitSet.java source files for easier comparison, no real changes. Only real change is in TestLongBitSet.testSmall to have the same range as in TestFixedBitSet.testSmall commit 4ac986ca0f505984cdae88a248c057f94d0fb9cd Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-14T14:39:24Z Removed obsolete mod 64 from FixedBitSet.flip(int) Added LongBitSet.flip(long) Updated TestLongBitSet accordingly commit e67844ddf9a20394ed57209c7d0c333ccd6af9a4 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-14T07:01:41Z TestFixedBitSet.prove that tests create illegal bitsets commit 06af400f51bf9ccd603d9edc976ae552478f5773 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-14T14:47:30Z Add same proofs to TestLongBitSet commit e606cd6e73eac7bb7111f44402d87c852141d200 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-10T16:26:53Z Added verifyGhostBitsClear method that is called in an assert in ensureCapacity to avoid errors in later calls. Marked methods that are prone to such errors in comments Aligned source code of FixedBitSet and LongBitSet for easier side-by-side comparison Test now fail on this verifyGhostBitsClear assert commit 1ed5f16bb892bc19d588d995e759f123d9d267c7 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-15T08:12:32Z TestFixedBitSet: added more tests to demonstrate spurious failures when ghost bits are present Temporarily disabled assert in FixedBitSet constructor commit 7ca7c6566fbcc9c841339fd1ec2209ceead6aa9f Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-14T15:39:00Z makeFixedBitSet and makeLongBitSet should not create ghost bits Tests pass commit d696ebc11c7db43fac0d2a93db10fc50d69e32e9 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-15T10:04:31Z Merge branch 'trunk' into BitSetFixes-GhostBits commit e7bab7c6890928227fe18e783ec07489ba272623 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-15T13:36:58Z Performance: use numWords instead of bits.length in e.g. cardinality() Consistency: Use numBits instead of length() commit 8d0753cb1ba07e67d518d0037f27fba0bfc8dfb4 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-15T13:50:38Z Provide a 'slow' isEmpty() implementation that is still faster than any external one could be (e.g.: nextSetBit(0) != -1 for LongBitSet) Called it checkIfEmpty() to emphasize the cost of calling it.
        Hide
        Luc Vanlerberghe added a comment - - edited

        I still have some issues with the ensureCapacity methods:
        The doc comment states:

           * If the given {@link FixedBitSet} is large enough to hold {@code numBits},
           * returns the given bits, otherwise returns a new {@link FixedBitSet} which
           * can hold the requested number of bits.
        

        while actually it checks if it is large enough to hold (numBits+1) bits (I already changed the doc comment in one of the commits of my pull request to reflect this).
        In lucene/solr the typical usage is:

        1.     docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID);
              docsWithField.set(docID);
          

          but also:

        2.       newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length());
                newbits.or(otherDocSet.bits);
          

        (1) only works because the doc comment doesn't correspond to the implementation. Correct usage would be ... ensureCapacity(docsWithField, docID + 1)
        (2) will unexpectly grow newBits even when it is exactly the same size.

        The implementation is written as if the numBits argument is a "number of bits" value, but then proceeds to allocate at least an extra long in the backing array...

        There are several options here:

        1. Only update the doc comment (like I did) so unsuspecting users don't get an unexpected performance hit when manipulating equal sized bitsets, but then the name stays awkward.
        2. Fix the implementation and update all locations in lucene/solr where it is used (but this may/will affect custom modules without warning)
        3. Rename the methods and "numBits" argument (ensureIndexAvailable anyone?). This will break the compilation of custom modules, but it's @lucene.internal anyway, not a public api.
        Show
        Luc Vanlerberghe added a comment - - edited I still have some issues with the ensureCapacity methods: The doc comment states: * If the given {@link FixedBitSet} is large enough to hold {@code numBits}, * returns the given bits, otherwise returns a new {@link FixedBitSet} which * can hold the requested number of bits. while actually it checks if it is large enough to hold (numBits+1) bits (I already changed the doc comment in one of the commits of my pull request to reflect this). In lucene/solr the typical usage is: docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID); docsWithField.set(docID); but also: newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length()); newbits.or(otherDocSet.bits); (1) only works because the doc comment doesn't correspond to the implementation. Correct usage would be ... ensureCapacity(docsWithField, docID + 1) (2) will unexpectly grow newBits even when it is exactly the same size. The implementation is written as if the numBits argument is a "number of bits" value, but then proceeds to allocate at least an extra long in the backing array... There are several options here: Only update the doc comment (like I did) so unsuspecting users don't get an unexpected performance hit when manipulating equal sized bitsets, but then the name stays awkward. Fix the implementation and update all locations in lucene/solr where it is used (but this may/will affect custom modules without warning) Rename the methods and "numBits" argument (ensureIndexAvailable anyone?). This will break the compilation of custom modules, but it's @lucene.internal anyway, not a public api.
        Hide
        Adrien Grand added a comment -

        Thanks Luc. I have some questions/comments:

        • Lots of javadocs mention "Depends on the ghost bits being clear!" (eg. cardinality()) but since your PR checks that there are no ghost bits when creating or growing the bit set, this is not something that consumers of these APIs should worry about?
        • Do we actually need the new "checkIfEmpty" method? I suspect checking the return values of nextSetBit(0) value would be almost as fast?
        • In unit tests, you should use the random() object instead of creating new Random objects so that tests remain reproducible (when there is a test failure, we know the seed that triggered this failure and running the tests with the same seed reproduces the same sequence of random numbers)

        At first I did not understand your changes on FixedBitSet.ensureCapacity but looking at the impl it has weird semantics indeed. Thanks for fixing the docs for now but in the longer term I think we should fix it to work more like ArrayUtil.grow.

        Also thank you for adding javadocs to undocumented methods!

        Show
        Adrien Grand added a comment - Thanks Luc. I have some questions/comments: Lots of javadocs mention "Depends on the ghost bits being clear!" (eg. cardinality()) but since your PR checks that there are no ghost bits when creating or growing the bit set, this is not something that consumers of these APIs should worry about? Do we actually need the new "checkIfEmpty" method? I suspect checking the return values of nextSetBit(0) value would be almost as fast? In unit tests, you should use the random() object instead of creating new Random objects so that tests remain reproducible (when there is a test failure, we know the seed that triggered this failure and running the tests with the same seed reproduces the same sequence of random numbers) At first I did not understand your changes on FixedBitSet.ensureCapacity but looking at the impl it has weird semantics indeed. Thanks for fixing the docs for now but in the longer term I think we should fix it to work more like ArrayUtil.grow. Also thank you for adding javadocs to undocumented methods!
        Hide
        Adrien Grand added a comment -

        At first I did not understand your changes on FixedBitSet.ensureCapacity but looking at the impl it has weird semantics indeed.

        Oops I commented before seeing your last comment. I think we should do option 1 for now (ie. in this patch) but then open another issue and implement option 2 if that works for you.

        Show
        Adrien Grand added a comment - At first I did not understand your changes on FixedBitSet.ensureCapacity but looking at the impl it has weird semantics indeed. Oops I commented before seeing your last comment. I think we should do option 1 for now (ie. in this patch) but then open another issue and implement option 2 if that works for you.
        Hide
        Luc Vanlerberghe added a comment -
        • I wasn't sure whether to include the comment or not, but decided to do so anyway since the check is in an assert (and hence will only be triggered when asserts are enabled). I used an assert to avoid affecting performance on proper use. I think the user should be warned not to recycle any 'old' FixedBitSet that happens not to have any bits set in the numbits range without clearing it properly. Perhaps the doc comment should only be left on the constructor and moved to a 'normal' comment for other methods.
        • I needed the checkEmpty method I do use nextSetBit(0) for now, but that will throw an exception if the BitSet has 0 size...
        • I did use the random() object, I just stored it in a local variable because I needed it several times in a row (should be ok from reading the doc comment on LuceneTestCase.random())

        On ensureCapacity: ok for opening an new issue after this one is closed...

        Show
        Luc Vanlerberghe added a comment - I wasn't sure whether to include the comment or not, but decided to do so anyway since the check is in an assert (and hence will only be triggered when asserts are enabled). I used an assert to avoid affecting performance on proper use. I think the user should be warned not to recycle any 'old' FixedBitSet that happens not to have any bits set in the numbits range without clearing it properly. Perhaps the doc comment should only be left on the constructor and moved to a 'normal' comment for other methods. I needed the checkEmpty method I do use nextSetBit(0) for now, but that will throw an exception if the BitSet has 0 size... I did use the random() object, I just stored it in a local variable because I needed it several times in a row (should be ok from reading the doc comment on LuceneTestCase.random()) On ensureCapacity: ok for opening an new issue after this one is closed...
        Hide
        Adrien Grand added a comment -

        Perhaps the doc comment should only be left on the constructor and moved to a 'normal' comment for other methods.

        +1

        I needed the checkEmpty method I do use nextSetBit(0) for now, but that will throw an exception if the BitSet has 0 size...

        OK I see. Then can you rename to isEmpty() for consistency with java collections?

        I did use the random() object

        Sorry I somehow misread your code!

        Show
        Adrien Grand added a comment - Perhaps the doc comment should only be left on the constructor and moved to a 'normal' comment for other methods. +1 I needed the checkEmpty method I do use nextSetBit(0) for now, but that will throw an exception if the BitSet has 0 size... OK I see. Then can you rename to isEmpty() for consistency with java collections? I did use the random() object Sorry I somehow misread your code!
        Hide
        Luc Vanlerberghe added a comment -

        OK I see. Then can you rename to isEmpty() for consistency with java collections?

        I would, but there's this comment in the code:

          // NOTE: no .isEmpty() here because that's trappy (ie,
          // typically isEmpty is low cost, but this one wouldn't
          // be)
        

        I'm open to suggestions for the name though (Perhaps I should revert to scanIsEmpty like I had before?)

        Show
        Luc Vanlerberghe added a comment - OK I see. Then can you rename to isEmpty() for consistency with java collections? I would, but there's this comment in the code: // NOTE: no .isEmpty() here because that's trappy (ie, // typically isEmpty is low cost, but this one wouldn't // be) I'm open to suggestions for the name though (Perhaps I should revert to scanIsEmpty like I had before?)
        Hide
        Adrien Grand added a comment -

        Hmm I agree it is a valid concern... Then maybe we should not add this method and let callers do boolean empty = bitSet.length() == 0 || bitSet.nextSetBit(0) == DocIdSetIterator.NO_MORE_DOCS; themselves?

        Show
        Adrien Grand added a comment - Hmm I agree it is a valid concern... Then maybe we should not add this method and let callers do boolean empty = bitSet.length() == 0 || bitSet.nextSetBit(0) == DocIdSetIterator.NO_MORE_DOCS; themselves?
        Hide
        Luc Vanlerberghe added a comment -
        • I moved the "Depends on the ghost bits being clear!" line from the doc comments to a regular comment, except for the constructor (where the fact is verified if assertions are enabled)
        • I renamed checkIfEmpty to scanIsEmpty and updated comments.

        I don't see the problem with having a scanIsEmpty method available for those willing to look for it. It probably will make a performance difference for small bitsets because it avoids the overhead of two method calls and the numberOfTrailingZeros call in nextSetBit on the first non-zero long.

        LUCENE-5856 shows that even removing a useless & 0x3f from *BitSet.get and company can have a noticeable effect (albeit probably in tight inner loops)
        By the way, I did remove a useless & 0x3f from FixedBitSet.flip in this patch as well

        Show
        Luc Vanlerberghe added a comment - I moved the "Depends on the ghost bits being clear!" line from the doc comments to a regular comment, except for the constructor (where the fact is verified if assertions are enabled) I renamed checkIfEmpty to scanIsEmpty and updated comments. I don't see the problem with having a scanIsEmpty method available for those willing to look for it. It probably will make a performance difference for small bitsets because it avoids the overhead of two method calls and the numberOfTrailingZeros call in nextSetBit on the first non-zero long. LUCENE-5856 shows that even removing a useless & 0x3f from *BitSet.get and company can have a noticeable effect (albeit probably in tight inner loops) By the way, I did remove a useless & 0x3f from FixedBitSet.flip in this patch as well
        Hide
        Adrien Grand added a comment -

        I don't see the problem with having a scanIsEmpty method available for those willing to look for it.

        But it doesn't bring anything either since this method is not used anywhere for now?

        I moved the "Depends on the ghost bits being clear!" line from the doc comments to a regular comment, except for the constructor (where the fact is verified if assertions are enabled)

        Thanks. Can you just remove the duplicated javadocs from oal.util.BitSet, the javadoc tool should already inherit javadocs from overridden methods?

        I did remove a useless & 0x3f from FixedBitSet.flip in this patch as well

        Good catch!

        Show
        Adrien Grand added a comment - I don't see the problem with having a scanIsEmpty method available for those willing to look for it. But it doesn't bring anything either since this method is not used anywhere for now? I moved the "Depends on the ghost bits being clear!" line from the doc comments to a regular comment, except for the constructor (where the fact is verified if assertions are enabled) Thanks. Can you just remove the duplicated javadocs from oal.util.BitSet, the javadoc tool should already inherit javadocs from overridden methods? I did remove a useless & 0x3f from FixedBitSet.flip in this patch as well Good catch!
        Hide
        Luc Vanlerberghe added a comment - - edited

        I updated my pull request:

        • Deleted obsolete doc comments on @Override methods
        • TestFixedBitSet: Made an accidentally public method private again
        • org.apache.solr.search.TestFiltering: Corrected possible generation of 'ghost' bits for FixedBitSet

        About scanIsEmpty():

        But it doesn't bring anything either since this method is not used anywhere for now?

        I did find a case where it would be useful: In oals.SloppyPhraseScorer there's this code:

            // collisions resolved, now re-queue
            // empty (partially) the queue until seeing all pps advanced for resolving collisions
            int n = 0;
            // TODO would be good if we can avoid calling cardinality() in each iteration!
            int numBits = bits.length(); // larges bit we set
            while (bits.cardinality() > 0) {
              PhrasePositions pp2 = pq.pop();
              rptStack[n++] = pp2;
              if (pp2.rptGroup >= 0 
                  && pp2.rptInd < numBits  // this bit may not have been set
                  && bits.get(pp2.rptInd)) {
                bits.clear(pp2.rptInd);
              }
            }
        

        and some places that assert that .cardinality() == 0.

        Show
        Luc Vanlerberghe added a comment - - edited I updated my pull request: Deleted obsolete doc comments on @Override methods TestFixedBitSet: Made an accidentally public method private again org.apache.solr.search.TestFiltering: Corrected possible generation of 'ghost' bits for FixedBitSet About scanIsEmpty(): But it doesn't bring anything either since this method is not used anywhere for now? I did find a case where it would be useful: In oals.SloppyPhraseScorer there's this code: // collisions resolved, now re-queue // empty (partially) the queue until seeing all pps advanced for resolving collisions int n = 0; // TODO would be good if we can avoid calling cardinality() in each iteration! int numBits = bits.length(); // larges bit we set while (bits.cardinality() > 0) { PhrasePositions pp2 = pq.pop(); rptStack[n++] = pp2; if (pp2.rptGroup >= 0 && pp2.rptInd < numBits // this bit may not have been set && bits.get(pp2.rptInd)) { bits.clear(pp2.rptInd); } } and some places that assert that .cardinality() == 0.
        Hide
        Adrien Grand added a comment -

        To be honest this doesn't look like a valid use-case of scanIfEmpty to me. As the code comment suggests, we should rewrite this code to not check that the bitset is empty in a loop. In practice we are trying to move away from these linear-time operations of FixedBitSet (nextDoc(), cardinality(), ...) as much as we can. For instance when we use this class for query execution (for multi-term queries mainly), we strive to only use a FixedBitSet if we know that the set of documents that we are going to store is dense enough for FixedBitSet to not be slower than a sparser implementation.

        Other than the new unused methods (scanIfEmpty and LongBitset.flip) the PR looks good to me now.

        Show
        Adrien Grand added a comment - To be honest this doesn't look like a valid use-case of scanIfEmpty to me. As the code comment suggests, we should rewrite this code to not check that the bitset is empty in a loop. In practice we are trying to move away from these linear-time operations of FixedBitSet (nextDoc(), cardinality(), ...) as much as we can. For instance when we use this class for query execution (for multi-term queries mainly), we strive to only use a FixedBitSet if we know that the set of documents that we are going to store is dense enough for FixedBitSet to not be slower than a sparser implementation. Other than the new unused methods (scanIfEmpty and LongBitset.flip) the PR looks good to me now.
        Hide
        Luc Vanlerberghe added a comment -

        I didn't check if these where valid use cases or not, just that they happen to be in the code, probably for lack of an easy alternative to call.

        While developing custom modules for solr, I took advantage of the existence of FixedBitSet (it was called OpenBitSet back then), but found the lack of an isEmpty() method annoying. There must be others like me...

        I only added LongBitSet.flip(long) to be able to make it easier to compare the TestFixedBitSet and TestLongBitSet files side by side (The main reason why I removed the import java.util.BitSet from TestLongBitSet as well)
        There are plenty of methods in LongBitSet.java that aren't used yet (even ensureCapacity that was corrected in LUCENE-6409)
        Although I agree that oal.util shouldn't become a bag of classes and methods that might be useful one day, aiming to keep LongBitSet and FixedBitSet more or less in sync (even though by definition they'll never share the same interface) shouldn't be too hard.

        Show
        Luc Vanlerberghe added a comment - I didn't check if these where valid use cases or not, just that they happen to be in the code, probably for lack of an easy alternative to call. While developing custom modules for solr, I took advantage of the existence of FixedBitSet (it was called OpenBitSet back then), but found the lack of an isEmpty() method annoying. There must be others like me... I only added LongBitSet.flip(long) to be able to make it easier to compare the TestFixedBitSet and TestLongBitSet files side by side (The main reason why I removed the import java.util.BitSet from TestLongBitSet as well) There are plenty of methods in LongBitSet.java that aren't used yet (even ensureCapacity that was corrected in LUCENE-6409 ) Although I agree that oal.util shouldn't become a bag of classes and methods that might be useful one day, aiming to keep LongBitSet and FixedBitSet more or less in sync (even though by definition they'll never share the same interface) shouldn't be too hard.
        Hide
        Adrien Grand added a comment -

        Although I agree that oal.util shouldn't become a bag of classes and methods that might be useful one day, aiming to keep LongBitSet and FixedBitSet more or less in sync (even though by definition they'll never share the same interface) shouldn't be too hard.

        +1 on that

        I only added LongBitSet.flip(long) to be able to make it easier to compare the TestFixedBitSet and TestLongBitSet files side by side (The main reason why I removed the import java.util.BitSet from TestLongBitSet as well)

        Actually I just noticed that FixedBitSet.flip is unused: it is only used by xor which is not used.

        I'm not happy that I'm delaying your good fixes with concerns around unused methods so if it works for you, I'll merge your pull-request as-is and will have a look at removing unused stuff later.

        Show
        Adrien Grand added a comment - Although I agree that oal.util shouldn't become a bag of classes and methods that might be useful one day, aiming to keep LongBitSet and FixedBitSet more or less in sync (even though by definition they'll never share the same interface) shouldn't be too hard. +1 on that I only added LongBitSet.flip(long) to be able to make it easier to compare the TestFixedBitSet and TestLongBitSet files side by side (The main reason why I removed the import java.util.BitSet from TestLongBitSet as well) Actually I just noticed that FixedBitSet.flip is unused: it is only used by xor which is not used. I'm not happy that I'm delaying your good fixes with concerns around unused methods so if it works for you, I'll merge your pull-request as-is and will have a look at removing unused stuff later.
        Hide
        ASF subversion and git services added a comment -

        Commit 1676617 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1676617 ]

        LUCENE-6427: Added assertion about the presence of ghost bits in (Fixed|Long)BitSet.

        Show
        ASF subversion and git services added a comment - Commit 1676617 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1676617 ] LUCENE-6427 : Added assertion about the presence of ghost bits in (Fixed|Long)BitSet.
        Hide
        ASF subversion and git services added a comment -

        Commit 1676624 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1676624 ]

        LUCENE-6427: Added assertion about the presence of ghost bits in (Fixed|Long)BitSet.

        Show
        ASF subversion and git services added a comment - Commit 1676624 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1676624 ] LUCENE-6427 : Added assertion about the presence of ghost bits in (Fixed|Long)BitSet.
        Hide
        Adrien Grand added a comment -

        Committed. Thanks Luc!

        Show
        Adrien Grand added a comment - Committed. Thanks Luc!
        Hide
        ASF GitHub Bot added a comment -

        Github user jpountz commented on the pull request:

        https://github.com/apache/lucene-solr/pull/142#issuecomment-97202666

        Merged, see https://issues.apache.org/jira/browse/LUCENE-6427. (Unfortunately I can't close.)

        Show
        ASF GitHub Bot added a comment - Github user jpountz commented on the pull request: https://github.com/apache/lucene-solr/pull/142#issuecomment-97202666 Merged, see https://issues.apache.org/jira/browse/LUCENE-6427 . (Unfortunately I can't close.)
        Hide
        Luc Vanlerberghe added a comment -

        Thanks for committing!

        Actually, just mentioning "This closes #142" in the commit message should be sufficient to trigger an ASF bot to close the pull request:
        (Just like my mentioning LUCENE-6427 in the pull request title triggered an automatic comment in the jira issue)
        See https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow

        I'll close it manually, no problem.

        Show
        Luc Vanlerberghe added a comment - Thanks for committing! Actually, just mentioning "This closes #142" in the commit message should be sufficient to trigger an ASF bot to close the pull request: (Just like my mentioning LUCENE-6427 in the pull request title triggered an automatic comment in the jira issue) See https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow I'll close it manually, no problem.
        Hide
        ASF GitHub Bot added a comment -

        Github user LucVL closed the pull request at:

        https://github.com/apache/lucene-solr/pull/142

        Show
        ASF GitHub Bot added a comment - Github user LucVL closed the pull request at: https://github.com/apache/lucene-solr/pull/142
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Luc Vanlerberghe
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development