Lucene - Core
  1. Lucene - Core
  2. LUCENE-5236

Use broadword bit selection in EliasFanoDecoder

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Try and speed up decoding

      1. TestDocIdSetBenchmark.java
        9 kB
        Paul Elschot
      2. LUCENE-5236.patch
        31 kB
        Paul Elschot
      3. LUCENE-5236.patch
        36 kB
        Paul Elschot
      4. LUCENE-5236.patch
        40 kB
        Paul Elschot
      5. LUCENE-5236.patch
        43 kB
        Paul Elschot

        Issue Links

          Activity

          Paul Elschot created issue -
          Hide
          Paul Elschot added a comment -

          This consists of using BroadWord.select9 instead of the linear search within the upper bits long that represents highTarget in EliasFanoDecoder.advanceToHighValue .

          One way to do that is to inline methods nextHighValue, toAfterCurrentHighBit, toNextHighLong and getCurrentRightShift until only basic java operations are in the code.
          Then the linear search on curHighLong becomes visible.
          Then hopefully some simplifications can be done, best by using assert's and testing.
          Finally BroadWord.select9 can be introduced.

          When I tried this, some tests failed when I tried to simplify: TestEliasFanoSequence and also TestEliasFanoDocIdSet .
          So I'll simply restart from scratch (trunk), but I have no idea when I have the time to do this.

          Show
          Paul Elschot added a comment - This consists of using BroadWord.select9 instead of the linear search within the upper bits long that represents highTarget in EliasFanoDecoder.advanceToHighValue . One way to do that is to inline methods nextHighValue, toAfterCurrentHighBit, toNextHighLong and getCurrentRightShift until only basic java operations are in the code. Then the linear search on curHighLong becomes visible. Then hopefully some simplifications can be done, best by using assert's and testing. Finally BroadWord.select9 can be introduced. When I tried this, some tests failed when I tried to simplify: TestEliasFanoSequence and also TestEliasFanoDocIdSet . So I'll simply restart from scratch (trunk), but I have no idea when I have the time to do this.
          Paul Elschot made changes -
          Field Original Value New Value
          Link This issue depends upon LUCENE-5109 [ LUCENE-5109 ]
          Hide
          Paul Elschot added a comment -

          LUCENE-5109 also modifies the EliasFanoDecoder.advanceToValue method.

          Show
          Paul Elschot added a comment - LUCENE-5109 also modifies the EliasFanoDecoder.advanceToValue method.
          Hide
          Paul Elschot added a comment -

          The patch of 5 Oct 2013 includes my modifications for LUCENE-5109.
          The patch also adds some tests.

          This passes TestEliasFanoSequence and TestEliasFanoDocIdSet.

          I did not do any performance tests for broadword bit selection yet.
          The times taken by some of the longer tests are encouraging, but that does not mean much.

          Show
          Paul Elschot added a comment - The patch of 5 Oct 2013 includes my modifications for LUCENE-5109 . The patch also adds some tests. This passes TestEliasFanoSequence and TestEliasFanoDocIdSet. I did not do any performance tests for broadword bit selection yet. The times taken by some of the longer tests are encouraging, but that does not mean much.
          Paul Elschot made changes -
          Attachment LUCENE-5236.patch [ 12607002 ]
          Hide
          Paul Elschot added a comment -

          See http://people.apache.org/~jpountz/doc_id_sets.html for the EliasFanoDocIdSet without index and without broadword bit selection.

          With index and broadword bit selection on my 32 bit machine, worst case performance for load factor -1 (1/10) for any advance(), relative to FixedBitSet, is -3.42 (2 log scale) for advance(3571).
          The other advance() cases have better worst cases, so broadword bit selection really helps there, too.

          Show
          Paul Elschot added a comment - See http://people.apache.org/~jpountz/doc_id_sets.html for the EliasFanoDocIdSet without index and without broadword bit selection. With index and broadword bit selection on my 32 bit machine, worst case performance for load factor -1 (1/10) for any advance(), relative to FixedBitSet, is -3.42 (2 log scale) for advance(3571). The other advance() cases have better worst cases, so broadword bit selection really helps there, too.
          Hide
          Paul Elschot added a comment -

          Mostly the same patch as the one of 5 October, but with improved javadocs, mostly to mention where in the decoder the new index is used and not used.

          Also a minor API change (everything is still marked experimental): renamed EliasFanoDecoder.index() to currentIndex() to avoid the impression of setting the index.

          Added EliasFanoDecoder.currentValue() for use in combination with advanceToIndex().

          Show
          Paul Elschot added a comment - Mostly the same patch as the one of 5 October, but with improved javadocs, mostly to mention where in the decoder the new index is used and not used. Also a minor API change (everything is still marked experimental): renamed EliasFanoDecoder.index() to currentIndex() to avoid the impression of setting the index. Added EliasFanoDecoder.currentValue() for use in combination with advanceToIndex().
          Paul Elschot made changes -
          Attachment LUCENE-5236.patch [ 12607233 ]
          Hide
          Paul Elschot added a comment -

          The version of DocIdSetBenchmark from LUCENE-5101 that I am currently using.

          Main changes:

          • changed it into a test case to make it easier to use, the main() in still in there.
          • added a few load factors.
          • ran into a score value problem when exhausting an iterator could only be done once (for larger sets), so I changed the score value into float and divided it by the "actual" run time.
          Show
          Paul Elschot added a comment - The version of DocIdSetBenchmark from LUCENE-5101 that I am currently using. Main changes: changed it into a test case to make it easier to use, the main() in still in there. added a few load factors. ran into a score value problem when exhausting an iterator could only be done once (for larger sets), so I changed the score value into float and divided it by the "actual" run time.
          Paul Elschot made changes -
          Attachment TestDocIdSetBenchmark.java [ 12607456 ]
          Adrien Grand made changes -
          Assignee Adrien Grand [ jpountz ]
          Hide
          Paul Elschot added a comment - - edited

          The patch of 13 Oct 2013 also includes a method name correction in the BroadWord class (see at LUCENE-5098), and it removes some unused code and one or two commented code lines.

          Show
          Paul Elschot added a comment - - edited The patch of 13 Oct 2013 also includes a method name correction in the BroadWord class (see at LUCENE-5098 ), and it removes some unused code and one or two commented code lines.
          Paul Elschot made changes -
          Attachment LUCENE-5236.patch [ 12608200 ]
          Hide
          Adrien Grand added a comment -

          /** @return See {@link EliasFanoEncoder#sufficientlySmallerThanBitSet(long, long)} */

          We try to put text rather than just javadoc tags in javadocs (ie. "Return X" rather than "@return X") so that the method summary doesn't appear empty on the class-level javadocs.

          private static long numLongsForBits(long numBits) { // Note: int version in FixedBitSet.bits2words()

          I think you can directly use OpenBitSet.bits2words here? It already works with longs.

          There are some tabs mixed up with spaces and a broken javadoc link (see ant precommit) but the patch otherwise looks really good!

          With index and broadword bit selection on my 32 bit machine, worst case performance for load factor -1 (1/10) for any advance(), relative to FixedBitSet, is -3.42 (2 log scale) for advance(3571).
          The other advance() cases have better worst cases, so broadword bit selection really helps there, too.

          Wow! This looks excellent! I will run the benchmark on my 64-bits machine tomorrow to see if I get similar results.

          Show
          Adrien Grand added a comment - /** @return See {@link EliasFanoEncoder#sufficientlySmallerThanBitSet(long, long)} */ We try to put text rather than just javadoc tags in javadocs (ie. "Return X" rather than "@return X") so that the method summary doesn't appear empty on the class-level javadocs. private static long numLongsForBits(long numBits) { // Note: int version in FixedBitSet.bits2words() I think you can directly use OpenBitSet.bits2words here? It already works with longs. There are some tabs mixed up with spaces and a broken javadoc link (see ant precommit ) but the patch otherwise looks really good! With index and broadword bit selection on my 32 bit machine, worst case performance for load factor -1 (1/10) for any advance(), relative to FixedBitSet, is -3.42 (2 log scale) for advance(3571). The other advance() cases have better worst cases, so broadword bit selection really helps there, too. Wow! This looks excellent! I will run the benchmark on my 64-bits machine tomorrow to see if I get similar results.
          Paul Elschot made changes -
          Attachment LUCENE-5236.patch [ 12608222 ]
          Hide
          Paul Elschot added a comment -

          The second patch of 13 Oct 2013 improves the javadocs of EliasFanoDocIdSet, and adds the condition to sufficientlySmallerThanBitSet to always prefer a bit set when it uses no more than 4 longs.

          ant validate
          does not complain about tabs anymore here.

          I could not find a broken link in the javadocs, maybe because I'm using java 1.7:
          ant documentation-lint
          fails with the error message "Linting documentation HTML is not supported on this Java version (1.7)".
          There is an http:// reference to archiv.org in the javadoc as plain text (it is not marked up as a link), and just now it worked fine as a link.

          OpenBitSet.bits2words will fail when the number of bits gets just over the maximum int value, so I prefer not to use it for now.

          I am looking forward to the benchmark results on a 64 bits machine.

          In the decoder there is a test to use naive bit selection when the needed rank is less then or equal to 8, and otherwise use broadword bit selection.
          I expect this constant 8 to be close to the optimal value on a current 64 bit machine, but some tuning may be needed.
          See also my first comment of 11 July 2013 at LUCENE-5098.

          Show
          Paul Elschot added a comment - The second patch of 13 Oct 2013 improves the javadocs of EliasFanoDocIdSet, and adds the condition to sufficientlySmallerThanBitSet to always prefer a bit set when it uses no more than 4 longs. ant validate does not complain about tabs anymore here. I could not find a broken link in the javadocs, maybe because I'm using java 1.7: ant documentation-lint fails with the error message "Linting documentation HTML is not supported on this Java version (1.7)". There is an http:// reference to archiv.org in the javadoc as plain text (it is not marked up as a link), and just now it worked fine as a link. OpenBitSet.bits2words will fail when the number of bits gets just over the maximum int value, so I prefer not to use it for now. I am looking forward to the benchmark results on a 64 bits machine. In the decoder there is a test to use naive bit selection when the needed rank is less then or equal to 8, and otherwise use broadword bit selection. I expect this constant 8 to be close to the optimal value on a current 64 bit machine, but some tuning may be needed. See also my first comment of 11 July 2013 at LUCENE-5098 .
          Hide
          Adrien Grand added a comment -

          Thanks, ant precommit and tests ran successfully.

          fails with the error message "Linting documentation HTML is not supported on this Java version (1.7)".

          I don't clearly understand what triggers this message but it works for me although I have 1.7 as well.

          OpenBitSet.bits2words will fail when the number of bits gets just over the maximum int value, so I prefer not to use it for now.

          You mean you don't like the fact that the cast is unchecked, right? I guess we should fix it and use it?

          I am looking forward to the benchmark results on a 64 bits machine.

          Here it is, and it is excellent! http://people.apache.org/~jpountz/doc_id_sets.html

          I'll commit your latest patch soon if you have no objection.

          Show
          Adrien Grand added a comment - Thanks, ant precommit and tests ran successfully. fails with the error message "Linting documentation HTML is not supported on this Java version (1.7)". I don't clearly understand what triggers this message but it works for me although I have 1.7 as well. OpenBitSet.bits2words will fail when the number of bits gets just over the maximum int value, so I prefer not to use it for now. You mean you don't like the fact that the cast is unchecked, right? I guess we should fix it and use it? I am looking forward to the benchmark results on a 64 bits machine. Here it is, and it is excellent! http://people.apache.org/~jpountz/doc_id_sets.html I'll commit your latest patch soon if you have no objection.
          Hide
          Paul Elschot added a comment -

          I have no objection at all to committing. Needless to say that I am pleasantly surprised with these benchmark results.

          On my 32 bit machine FixedBitSet does better in general, i.e. the others do worse in the results.
          But the trends between the others are quite similar.

          So it's really time to upgrade to 64-bit. Mostly talking to myself, but in case there still are older production machines out there...

          Fixing FixedBitSet.bits2words involves changing the int numBits arguments in the FixedBitSet constructors to long,
          and testing whether the result of bits2words is smaller than max int.
          I don't see Lucene segments growing above 2G docs real soon now, so this is not urgent for FixedBitSet.

          Show
          Paul Elschot added a comment - I have no objection at all to committing. Needless to say that I am pleasantly surprised with these benchmark results. On my 32 bit machine FixedBitSet does better in general, i.e. the others do worse in the results. But the trends between the others are quite similar. So it's really time to upgrade to 64-bit. Mostly talking to myself, but in case there still are older production machines out there... Fixing FixedBitSet.bits2words involves changing the int numBits arguments in the FixedBitSet constructors to long, and testing whether the result of bits2words is smaller than max int. I don't see Lucene segments growing above 2G docs real soon now, so this is not urgent for FixedBitSet.
          Hide
          ASF subversion and git services added a comment -

          Commit 1532388 from Adrien Grand in branch 'dev/trunk'
          [ https://svn.apache.org/r1532388 ]

          LUCENE-5236: EliasFanoDocIdSet now has an index and uses broadword bit selection to speed-up advance()

          Show
          ASF subversion and git services added a comment - Commit 1532388 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1532388 ] LUCENE-5236 : EliasFanoDocIdSet now has an index and uses broadword bit selection to speed-up advance()
          Hide
          ASF subversion and git services added a comment -

          Commit 1532390 from Adrien Grand in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1532390 ]

          LUCENE-5236: EliasFanoDocIdSet now has an index and uses broadword bit selection to speed-up advance()

          Show
          ASF subversion and git services added a comment - Commit 1532390 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1532390 ] LUCENE-5236 : EliasFanoDocIdSet now has an index and uses broadword bit selection to speed-up advance()
          Hide
          Adrien Grand added a comment -

          Committed. Thanks Paul!

          Show
          Adrien Grand added a comment - Committed. Thanks Paul!
          Adrien Grand made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 4.6 [ 12324999 ]
          Resolution Fixed [ 1 ]
          Adrien Grand made changes -
          Link This issue relates to LUCENE-5109 [ LUCENE-5109 ]
          Hide
          Sebastiano Vigna added a comment -

          Sorry guys—just happened to read at this thread. My 2€¢:

          • Rank9/Select9 and full ranking structures, not rank/select-in-a-word algorithms.
          • I'm using Long.bitCount(), Long.numberOfTralilingZeros() etc. everywhere since Jan 2013. Intrinsification is a bet, but it pays, and using them will convince them IBM and others to intrinsify. If only Oracle would add Long.clearLowestBitSet() ...
          • The alternative algorithm for sideways addition (a.k.a. popcount) should be called "broadword" or Wilkes–Wheeler–Gill.
          • Selection gets better and better every day. http://vigna.di.unimi.it/select.php tries to keep track of the improvement. The current code in it.unimi.dsi.bits.Fast is the best I'm aware of.

          Ciao!

          Show
          Sebastiano Vigna added a comment - Sorry guys—just happened to read at this thread. My 2€¢: Rank9/Select9 and full ranking structures, not rank/select-in-a-word algorithms. I'm using Long.bitCount(), Long.numberOfTralilingZeros() etc. everywhere since Jan 2013. Intrinsification is a bet, but it pays, and using them will convince them IBM and others to intrinsify. If only Oracle would add Long.clearLowestBitSet() ... The alternative algorithm for sideways addition (a.k.a. popcount) should be called "broadword" or Wilkes–Wheeler–Gill. Selection gets better and better every day. http://vigna.di.unimi.it/select.php tries to keep track of the improvement. The current code in it.unimi.dsi.bits.Fast is the best I'm aware of. Ciao!
          Hide
          Paul Elschot added a comment -

          The name select9 was meanwhile replaced by select in the code here.

          Does using Long.* everywhere include using them for selection of a bit from a 64 word (in a loop)?

          I tried the link for select.php a few times, but I got a 404 Not Found.

          Show
          Paul Elschot added a comment - The name select9 was meanwhile replaced by select in the code here. Does using Long.* everywhere include using them for selection of a bit from a 64 word (in a loop)? I tried the link for select.php a few times, but I got a 404 Not Found.
          Hide
          Sebastiano Vigna added a comment -

          Sorry.

          http://vigna.di.unimi.it/Sux/select.php

          No, as I wrote the select-in-a-word code is the one in it.unimi.dsi.bits.Fast (DSI utilities). Looping to select works only on very sparse words, and Elias-Fano upper bits are about half zeroes and half ones (or about two thirds zeroes and one third ones, for unshifted strictly monotone sequences).

          That might change if Oracle give us intrinsified Long.clearLowestBit(). Some tests on Philip Prunin Facebook's code show that his simple strategy (use popcount to decide whether to select in the upper or lower half, and then loop) can be competitive if you have one-clock deletion of the lowest bit.

          In any case, we're talking about shaving 0.5 ns from the final phase of a search. Maybe that's overengineering LOL.

          BTW, in principle it could be possible to fix the problem with non-intrinsifying JVMs by using some replacement tricks like those used for running Java 6 code on Java 5 (e.g., retroweaver). It should be possible to have bitCount() running the broadword algorithm at no cost.

          Show
          Sebastiano Vigna added a comment - Sorry. http://vigna.di.unimi.it/Sux/select.php No, as I wrote the select-in-a-word code is the one in it.unimi.dsi.bits.Fast (DSI utilities). Looping to select works only on very sparse words, and Elias-Fano upper bits are about half zeroes and half ones (or about two thirds zeroes and one third ones, for unshifted strictly monotone sequences). That might change if Oracle give us intrinsified Long.clearLowestBit(). Some tests on Philip Prunin Facebook's code show that his simple strategy (use popcount to decide whether to select in the upper or lower half, and then loop) can be competitive if you have one-clock deletion of the lowest bit. In any case, we're talking about shaving 0.5 ns from the final phase of a search. Maybe that's overengineering LOL. BTW, in principle it could be possible to fix the problem with non-intrinsifying JVMs by using some replacement tricks like those used for running Java 6 code on Java 5 (e.g., retroweaver). It should be possible to have bitCount() running the broadword algorithm at no cost.

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Paul Elschot
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development