Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Use table lookup instead of some broadword manipulations

      1. LUCENE-6040.patch
        19 kB
        Paul Elschot
      2. LUCENE-6040.patch
        7 kB
        Paul Elschot

        Activity

        Hide
        Paul Elschot added a comment -

        This adds two bit select methods to BroadWord.java.

        selectWithByteLookup() works like select() but uses a lookup to select a bit from a byte in its last phase.

        selectWithOverflowAndByteLookup() also adds a lookup to determine the byte of the bit to be selected.

        Some very simple benchmarks on my machine (i7-4770, 3.4 GHz) are in the test code in the testPerf* methods. These show that selectWithByteLookup() takes about 20% less time than select(), and that selectWithOverflowAndByteLookup() takes about 33% less time.

        Show
        Paul Elschot added a comment - This adds two bit select methods to BroadWord.java. selectWithByteLookup() works like select() but uses a lookup to select a bit from a byte in its last phase. selectWithOverflowAndByteLookup() also adds a lookup to determine the byte of the bit to be selected. Some very simple benchmarks on my machine (i7-4770, 3.4 GHz) are in the test code in the testPerf* methods. These show that selectWithByteLookup() takes about 20% less time than select(), and that selectWithOverflowAndByteLookup() takes about 33% less time.
        Hide
        Paul Elschot added a comment -

        These array lookups are given in the article of 2013 by Simon Gog and Matthias Petri, 2013, "Optimized Succinct Data Structures for Massive Data" in Section 6.2, currently available at http://people.eng.unimelb.edu.au/sgog/optimized.pdf .

        A quote from there:

        Unfortunately, the development of efficient select operations on 64 -bit integers ( select 64 ) has not been as rapid as for popcnt . There are no direct CPU instructions available to return the position of the i -th set bit.

        As was noted at http://vigna.di.unimi.it/Sux/select.php , these two lookups have the overhead of java array accesses, so it would probably be wortwhile to use a native version of this, at least until better processor support becomes available.

        For the patch here, I'd expect that some speedup in Java is still possible.

        Show
        Paul Elschot added a comment - These array lookups are given in the article of 2013 by Simon Gog and Matthias Petri, 2013, "Optimized Succinct Data Structures for Massive Data" in Section 6.2, currently available at http://people.eng.unimelb.edu.au/sgog/optimized.pdf . A quote from there: Unfortunately, the development of efficient select operations on 64 -bit integers ( select 64 ) has not been as rapid as for popcnt . There are no direct CPU instructions available to return the position of the i -th set bit. As was noted at http://vigna.di.unimi.it/Sux/select.php , these two lookups have the overhead of java array accesses, so it would probably be wortwhile to use a native version of this, at least until better processor support becomes available. For the patch here, I'd expect that some speedup in Java is still possible.
        Hide
        Adrien Grand added a comment - - edited

        I just played with the patch and it makes the EliasFanoDocIdSet faster on large advance indeed. It doesn't seem useful to keep 4 impls, so maybe we should just remove the current select impl in trunk and replace it with selectWithOverflowAndByteLookup?

        Show
        Adrien Grand added a comment - - edited I just played with the patch and it makes the EliasFanoDocIdSet faster on large advance indeed. It doesn't seem useful to keep 4 impls, so maybe we should just remove the current select impl in trunk and replace it with selectWithOverflowAndByteLookup ?
        Hide
        Paul Elschot added a comment -

        Replacing the existing implementation is indeed the goal, but after further testing, which you've already done.
        The initial version of the patch is still here for those curious about the other implementations.

        The selectWithOverflowAndByteLookup() method has an added precondition that the bit to select must exist.
        As I recall this should the case because there is always a bitCount() before the select() call.
        Anyway the existing docidset tests should easily catch this if necessary.

        Show
        Paul Elschot added a comment - Replacing the existing implementation is indeed the goal, but after further testing, which you've already done. The initial version of the patch is still here for those curious about the other implementations. The selectWithOverflowAndByteLookup() method has an added precondition that the bit to select must exist. As I recall this should the case because there is always a bitCount() before the select() call. Anyway the existing docidset tests should easily catch this if necessary.
        Hide
        Paul Elschot added a comment - - edited

        The new select() implementation is so much simpler than the previous one that I preferred to move it into BitUtil, and delete the BroadWord class completely in this patch.
        The javadocs for select() refer to this issue.
        The tests are also simplified and moved into a new TestBitUtil class.

        Show
        Paul Elschot added a comment - - edited The new select() implementation is so much simpler than the previous one that I preferred to move it into BitUtil, and delete the BroadWord class completely in this patch. The javadocs for select() refer to this issue. The tests are also simplified and moved into a new TestBitUtil class.
        Hide
        ASF subversion and git services added a comment -

        Commit 1636913 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1636913 ]

        LUCENE-6040: Speedup broadword bit selection.

        Show
        ASF subversion and git services added a comment - Commit 1636913 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1636913 ] LUCENE-6040 : Speedup broadword bit selection.
        Hide
        ASF subversion and git services added a comment -

        Commit 1636914 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1636914 ]

        LUCENE-6040: Speedup broadword bit selection.

        Show
        ASF subversion and git services added a comment - Commit 1636914 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636914 ] LUCENE-6040 : Speedup broadword bit selection.
        Hide
        Adrien Grand added a comment -

        This looks great, I just committed the patch.

        Thanks Paul!

        Show
        Adrien Grand added a comment - This looks great, I just committed the patch. Thanks Paul!
        Hide
        ASF subversion and git services added a comment -

        Commit 1636963 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1636963 ]

        LUCENE-6040: svn:eol-style native

        Show
        ASF subversion and git services added a comment - Commit 1636963 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1636963 ] LUCENE-6040 : svn:eol-style native
        Hide
        ASF subversion and git services added a comment -

        Commit 1636964 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1636964 ]

        LUCENE-6040: svn:eol-style native (merge r1636963)

        Show
        ASF subversion and git services added a comment - Commit 1636964 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636964 ] LUCENE-6040 : svn:eol-style native (merge r1636963)
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Paul Elschot
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development