Lucene - Core
  1. Lucene - Core
  2. LUCENE-6409

LongBitSet.ensureCapacity overflows on numBits > Integer.MaxValue

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      LongBitSet.ensureCapacity calculates the number of longs required to store the number of bits correctly and allocates a long[] accordingly, but then shifts the array length (which is an int!) left by 6 bits. The int should be cast to long before performing the shift.

        Activity

        Hide
        ASF GitHub Bot added a comment -

        GitHub user LucVL opened a pull request:

        https://github.com/apache/lucene-solr/pull/140

        LUCENE-6409 BitSetFixes

        The pull request actually contains a few separate commits:

        • Demonstrates the bug in a simple test case and adds a patch.
        • Corrects ensureCapacity
        • Optimized bits2words for both FixedBitSet and LongBitSet

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/LucVL/lucene-solr lucene-6409-BitSetFixes

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/lucene-solr/pull/140.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #140


        commit 44adaaa577839225f60a355fc3538631e0b9e965
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2014-12-12T15:48:23Z

        Demonstrate bug in LongBitSet.ensureCapacity

        commit bb4257da61d45ab559ff8e3b23d9aa1d3981d366
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2014-12-12T15:53:54Z

        LongBitSet.ensureCapacity: Cast to long before shifting left!

        commit a70e1d4b1b654e2e3655c289485af28ecdc9cd92
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2014-12-12T16:17:09Z

        Avoid an "if" in bits2words and corrected some comments

        commit fd1cce82acd74a12aa4b022582985dfb8c8e4193
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-08T13:48:51Z

        Fixed grammar

        commit 849dbf7a983040b50b32bd798ad191f8dd39bffb
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-04-08T13:49:45Z

        Merge branch 'trunk' into BitSetFixes


        Show
        ASF GitHub Bot added a comment - GitHub user LucVL opened a pull request: https://github.com/apache/lucene-solr/pull/140 LUCENE-6409 BitSetFixes The pull request actually contains a few separate commits: Demonstrates the bug in a simple test case and adds a patch. Corrects ensureCapacity Optimized bits2words for both FixedBitSet and LongBitSet You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucVL/lucene-solr lucene-6409-BitSetFixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #140 commit 44adaaa577839225f60a355fc3538631e0b9e965 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2014-12-12T15:48:23Z Demonstrate bug in LongBitSet.ensureCapacity commit bb4257da61d45ab559ff8e3b23d9aa1d3981d366 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2014-12-12T15:53:54Z LongBitSet.ensureCapacity: Cast to long before shifting left! commit a70e1d4b1b654e2e3655c289485af28ecdc9cd92 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2014-12-12T16:17:09Z Avoid an "if" in bits2words and corrected some comments commit fd1cce82acd74a12aa4b022582985dfb8c8e4193 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-08T13:48:51Z Fixed grammar commit 849dbf7a983040b50b32bd798ad191f8dd39bffb Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-04-08T13:49:45Z Merge branch 'trunk' into BitSetFixes
        Hide
        Adrien Grand added a comment -

        Thanks for the fixes! I'm wondering that this new approach might cause bugs if you create bit sets of length 0. Can you add tests for this case too?

        Show
        Adrien Grand added a comment - Thanks for the fixes! I'm wondering that this new approach might cause bugs if you create bit sets of length 0. Can you add tests for this case too?
        Hide
        Luc Vanlerberghe added a comment -

        While looking over the code, I have some more remarks on Fixed-/LongBitSet that should be investigated:

        • Harmonize the use of numWords vs. bits.length vs. numBits
          • E.g.: cardinality scans up to bits.length, while "or" asserts on index<numWords
        • Performance: If a BitSet is allocated with n bits, ensureCapacity with the same number n shouldn't grow the BitSet
          • Either both the constructor and ensureCapacity should allocate a larger array than really needed or neither. ensureCapacity contains:
                if (numBits < bits.length()) {
                  return bits;
                } else {
                  int numWords = bits2words(numBits);
                  long[] arr = bits.getBits();
                  if (numWords >= arr.length) {
                    arr = ArrayUtil.grow(arr, numWords + 1);
                  }
                  return new FixedBitSet(arr, arr.length << 6);
                }
            

            The first "if" will fail and the second "if" will succeed, causing arr to be grown to at least 1 more...

        Show
        Luc Vanlerberghe added a comment - While looking over the code, I have some more remarks on Fixed-/LongBitSet that should be investigated: Harmonize the use of numWords vs. bits.length vs. numBits E.g.: cardinality scans up to bits.length, while "or" asserts on index<numWords Performance: If a BitSet is allocated with n bits, ensureCapacity with the same number n shouldn't grow the BitSet Either both the constructor and ensureCapacity should allocate a larger array than really needed or neither. ensureCapacity contains: if (numBits < bits.length()) { return bits; } else { int numWords = bits2words(numBits); long [] arr = bits.getBits(); if (numWords >= arr.length) { arr = ArrayUtil.grow(arr, numWords + 1); } return new FixedBitSet(arr, arr.length << 6); } The first "if" will fail and the second "if" will succeed, causing arr to be grown to at least 1 more...
        Hide
        Luc Vanlerberghe added a comment - - edited

        I added testBits2Words on TestLongBitSet with various values, concentrating on boundary cases

        I found some more issues that may cause spurious side effects.
        While trying to correct them, it turns out that the tests do 'illegal' things (like setting bits outside the [0, numBits[ range by manipulating the backing array directly)
        This should cause things like cardinality() to fail (because it doesn't use a mask for the last long, probably for performance), but apparently that's never tested.

        I propose that creating a new Long-/FixedBitSet based on an existing backing array should assert (in debug mode) that all the 'ghost' bits with index >= numBits should be clear.

        Perhaps I should create a new issue for this to allow the ensureCapacity bug to be closed first.

        Show
        Luc Vanlerberghe added a comment - - edited I added testBits2Words on TestLongBitSet with various values, concentrating on boundary cases I found some more issues that may cause spurious side effects. While trying to correct them, it turns out that the tests do 'illegal' things (like setting bits outside the [0, numBits[ range by manipulating the backing array directly) This should cause things like cardinality() to fail (because it doesn't use a mask for the last long, probably for performance), but apparently that's never tested. I propose that creating a new Long-/FixedBitSet based on an existing backing array should assert (in debug mode) that all the 'ghost' bits with index >= numBits should be clear. Perhaps I should create a new issue for this to allow the ensureCapacity bug to be closed first.
        Hide
        Adrien Grand added a comment -

        I propose that creating a new Long-/FixedBitSet based on an existing backing array should assert (in debug mode) that all the 'ghost' bits with index >= numBits should be clear.

        +1

        Perhaps I should create a new issue for this to allow the ensureCapacity bug to be closed first.

        It works for me! If you think you are done with this fix, I'll give it a deeper look and merge it.

        Show
        Adrien Grand added a comment - I propose that creating a new Long-/FixedBitSet based on an existing backing array should assert (in debug mode) that all the 'ghost' bits with index >= numBits should be clear. +1 Perhaps I should create a new issue for this to allow the ensureCapacity bug to be closed first. It works for me! If you think you are done with this fix, I'll give it a deeper look and merge it.
        Hide
        ASF subversion and git services added a comment -

        Commit 1673123 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1673123 ]

        LUCENE-6409: Fixed integer overflow in LongBitSet.ensureCapacity.

        Show
        ASF subversion and git services added a comment - Commit 1673123 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1673123 ] LUCENE-6409 : Fixed integer overflow in LongBitSet.ensureCapacity.
        Hide
        ASF subversion and git services added a comment -

        Commit 1673124 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1673124 ]

        LUCENE-6409: Fixed integer overflow in LongBitSet.ensureCapacity.

        Show
        ASF subversion and git services added a comment - Commit 1673124 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673124 ] LUCENE-6409 : Fixed integer overflow in LongBitSet.ensureCapacity.
        Hide
        Adrien Grand added a comment -

        Thanks Luc, I committed your changes. Let's have another JIRA issue to ensure that bits beyond numBits remain clear as you suggested?

        Show
        Adrien Grand added a comment - Thanks Luc, I committed your changes. Let's have another JIRA issue to ensure that bits beyond numBits remain clear as you suggested?
        Hide
        ASF GitHub Bot added a comment -

        Github user LucVL closed the pull request at:

        https://github.com/apache/lucene-solr/pull/140

        Show
        ASF GitHub Bot added a comment - Github user LucVL closed the pull request at: https://github.com/apache/lucene-solr/pull/140
        Hide
        ASF GitHub Bot added a comment -

        Github user LucVL commented on the pull request:

        https://github.com/apache/lucene-solr/pull/140#issuecomment-92720200

        Committed by Adrien Grand yesterday.
        Thanks Adrian!

        Show
        ASF GitHub Bot added a comment - Github user LucVL commented on the pull request: https://github.com/apache/lucene-solr/pull/140#issuecomment-92720200 Committed by Adrien Grand yesterday. Thanks Adrian!
        Hide
        Luc Vanlerberghe added a comment -

        I created a separate JIRA issue for the 'ghost bits' issue

        LUCENE-6427

        Show
        Luc Vanlerberghe added a comment - I created a separate JIRA issue for the 'ghost bits' issue LUCENE-6427
        Hide
        ASF subversion and git services added a comment -

        Commit 1675134 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1675134 ]

        LUCENE-6409: Mark TestLongBitSet.testHugeCapacity as @Nightly.

        Show
        ASF subversion and git services added a comment - Commit 1675134 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1675134 ] LUCENE-6409 : Mark TestLongBitSet.testHugeCapacity as @Nightly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1675135 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1675135 ]

        LUCENE-6409: Mark TestLongBitSet.testHugeCapacity as @Nightly.

        Show
        ASF subversion and git services added a comment - Commit 1675135 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1675135 ] LUCENE-6409 : Mark TestLongBitSet.testHugeCapacity as @Nightly.
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Luc Vanlerberghe
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development