Lucene - Core
  1. Lucene - Core
  2. LUCENE-4690

Optimize NumericUtils.*ToPrefixCoded(), add versions that don't hash

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As far as I can tell nothing actually uses the hash codes generated by these methods (not even any tests). If someone did want to generate a hash, it would be just as fast to do it on the BytesRef after the fact (or even faster from the input number itself).

      edit: Uwe pointed out they were used in one place. Other places still don't need it.

      1. LUCENE-4690.patch
        17 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        Current code:

          public static int longToPrefixCoded(final long val, final int shift, final BytesRef bytes) {
            if (shift>63 || shift<0)
              throw new IllegalArgumentException("Illegal shift value, must be 0..63");
            int hash, nChars = (63-shift)/7 + 1;
            bytes.offset = 0;
            bytes.length = nChars+1;
            if (bytes.bytes.length < bytes.length) {
              bytes.grow(NumericUtils.BUF_SIZE_LONG);
            }
            bytes.bytes[0] = (byte) (hash = (SHIFT_START_LONG + shift));
            long sortableBits = val ^ 0x8000000000000000L;
            sortableBits >>>= shift;
            while (nChars > 0) {
              // Store 7 bits per byte for compatibility
              // with UTF-8 encoding of terms
              bytes.bytes[nChars--] = (byte)(sortableBits & 0x7f);
              sortableBits >>>= 7;
            }
            // calculate hash
            for (int i = 1; i < bytes.length; i++) {
              hash = 31*hash + bytes.bytes[i];
            }
            return hash;
          }
        

        Proposed code template (i.e. for all of the *ToPrefixCoded methods):

          public void longToPrefixCodedBytes(final long val, final int shift, final BytesRef bytes) {
            assert (shift & ~0x3f) == 0;  // ensure shift is 0..63
            int nChars = (63-shift)/7 + 1;
            bytes.offset = 0;
            bytes.length = nChars+1;   // one extra for the byte that contains the shift info
            if (bytes.bytes.length < bytes.length) {
              bytes.bytes = new byte[NumericUtils.BUF_SIZE_LONG];  // use the max
            }
            bytes.bytes[0] = (byte)(SHIFT_START_LONG + shift);
            long sortableBits = val ^ 0x8000000000000000L;
            sortableBits >>>= shift;
            while (nChars > 0) {
              // Store 7 bits per byte for compatibility
              // with UTF-8 encoding of terms
              bytes.bytes[nChars--] = (byte)(sortableBits & 0x7f);
              sortableBits >>>= 7;
            }
          }
        

        Some of the changes:

        • Setting bytes.length to be larger than the current contained array temporarily puts BytesRef into an invalid state. Calling any BytesRef methods (like grow) while it is in that invalid state is suspect.
        • replace grow with simple allocation.
          1. grow over-allocates all the time. Most of the time (like here) it's wasted space.
          2. grow copies the previous buffer when allocating a bigger buffer. This is wasted/unneeded here.
        • removes hash code calculation
        Show
        Yonik Seeley added a comment - Current code: public static int longToPrefixCoded( final long val, final int shift, final BytesRef bytes) { if (shift>63 || shift<0) throw new IllegalArgumentException( "Illegal shift value, must be 0..63" ); int hash, nChars = (63-shift)/7 + 1; bytes.offset = 0; bytes.length = nChars+1; if (bytes.bytes.length < bytes.length) { bytes.grow(NumericUtils.BUF_SIZE_LONG); } bytes.bytes[0] = ( byte ) (hash = (SHIFT_START_LONG + shift)); long sortableBits = val ^ 0x8000000000000000L; sortableBits >>>= shift; while (nChars > 0) { // Store 7 bits per byte for compatibility // with UTF-8 encoding of terms bytes.bytes[nChars--] = ( byte )(sortableBits & 0x7f); sortableBits >>>= 7; } // calculate hash for ( int i = 1; i < bytes.length; i++) { hash = 31*hash + bytes.bytes[i]; } return hash; } Proposed code template (i.e. for all of the *ToPrefixCoded methods): public void longToPrefixCodedBytes( final long val, final int shift, final BytesRef bytes) { assert (shift & ~0x3f) == 0; // ensure shift is 0..63 int nChars = (63-shift)/7 + 1; bytes.offset = 0; bytes.length = nChars+1; // one extra for the byte that contains the shift info if (bytes.bytes.length < bytes.length) { bytes.bytes = new byte [NumericUtils.BUF_SIZE_LONG]; // use the max } bytes.bytes[0] = ( byte )(SHIFT_START_LONG + shift); long sortableBits = val ^ 0x8000000000000000L; sortableBits >>>= shift; while (nChars > 0) { // Store 7 bits per byte for compatibility // with UTF-8 encoding of terms bytes.bytes[nChars--] = ( byte )(sortableBits & 0x7f); sortableBits >>>= 7; } } Some of the changes: Setting bytes.length to be larger than the current contained array temporarily puts BytesRef into an invalid state. Calling any BytesRef methods (like grow) while it is in that invalid state is suspect. replace grow with simple allocation. 1. grow over-allocates all the time. Most of the time (like here) it's wasted space. 2. grow copies the previous buffer when allocating a bigger buffer. This is wasted/unneeded here. removes hash code calculation
        Hide
        Yonik Seeley added a comment -

        An additional cool little hack... even though it's much improved, IDIV still has a latency of 18-42 cycles on a core2 processor. One equivalent of i/7 is (i*37)>>8 for i in 0..63. This only takes 4 cycles.

        Show
        Yonik Seeley added a comment - An additional cool little hack... even though it's much improved, IDIV still has a latency of 18-42 cycles on a core2 processor. One equivalent of i/7 is (i*37)>>8 for i in 0..63. This only takes 4 cycles.
        Hide
        David Smiley added a comment -

        Curious, how do you know that and/or measure that?

        Show
        David Smiley added a comment - Curious, how do you know that and/or measure that?
        Hide
        Yonik Seeley added a comment -

        Curious, how do you know that and/or measure that?

        The number of cycles? It's all documented in various places. Of course one needs a good sense of what assembly a compiler/hotspot will emit.
        Integer multiply has been 3 cycles for quite a while for both Intel and AMD, and shifts have been a single cycle (after the ill-fated P4).

        http://gmplib.org/~tege/x86-timing.pdf
        http://www.agner.org/optimize/instruction_tables.pdf

        Show
        Yonik Seeley added a comment - Curious, how do you know that and/or measure that? The number of cycles? It's all documented in various places. Of course one needs a good sense of what assembly a compiler/hotspot will emit. Integer multiply has been 3 cycles for quite a while for both Intel and AMD, and shifts have been a single cycle (after the ill-fated P4). http://gmplib.org/~tege/x86-timing.pdf http://www.agner.org/optimize/instruction_tables.pdf
        Hide
        Uwe Schindler added a comment -

        As far as I can tell nothing actually uses the hash codes generated by these methods (not even any tests).

        The return value (the hash) is used by NumericTokenStreeam#NumericTermAttribute.fillBytesRef():

            @Override
            public int fillBytesRef() {
              try {
                assert valueSize == 64 || valueSize == 32;
                return (valueSize == 64) ? 
                  NumericUtils.longToPrefixCoded(value, shift, bytes) :
                  NumericUtils.intToPrefixCoded((int) value, shift, bytes);
              } catch (IllegalArgumentException iae) {
                // return empty token before first or after last
                bytes.length = 0;
                return 0;
              }
            }
        

        Other comments:

        • The masking away of invalid shifts is a no-go to me. This leads to unexpected behaviour.
        • A agree grow() does not need to be used for this stuff. We can simply reallocate, as we know size exactly.
        Show
        Uwe Schindler added a comment - As far as I can tell nothing actually uses the hash codes generated by these methods (not even any tests). The return value (the hash) is used by NumericTokenStreeam#NumericTermAttribute.fillBytesRef(): @Override public int fillBytesRef() { try { assert valueSize == 64 || valueSize == 32; return (valueSize == 64) ? NumericUtils.longToPrefixCoded(value, shift, bytes) : NumericUtils.intToPrefixCoded(( int ) value, shift, bytes); } catch (IllegalArgumentException iae) { // return empty token before first or after last bytes.length = 0; return 0; } } Other comments: The masking away of invalid shifts is a no-go to me. This leads to unexpected behaviour. A agree grow() does not need to be used for this stuff. We can simply reallocate, as we know size exactly.
        Hide
        Uwe Schindler added a comment -

        By the way, your patch above would corrumpt the IAE case in fillBytesRef used by indexer.

        Show
        Uwe Schindler added a comment - By the way, your patch above would corrumpt the IAE case in fillBytesRef used by indexer.
        Hide
        Yonik Seeley added a comment -

        The return value (the hash) is used by NumericTokenStreeam#NumericTermAttribute.fillBytesRef():

        Ahh, I didn't see it because the use of the value is on a separate line from the method call. Makes it hard to find.

        Show
        Yonik Seeley added a comment - The return value (the hash) is used by NumericTokenStreeam#NumericTermAttribute.fillBytesRef(): Ahh, I didn't see it because the use of the value is on a separate line from the method call. Makes it hard to find.
        Hide
        Yonik Seeley added a comment -

        Here's a patch that doubles the performance of NumericUtils.*ToPrefixCoded

        Show
        Yonik Seeley added a comment - Here's a patch that doubles the performance of NumericUtils.*ToPrefixCoded
        Hide
        Robert Muir added a comment -

        Seems good to me to have non-hashing versions (these versions exist for unicodeutil for terms already for similar purposes)

        Show
        Robert Muir added a comment - Seems good to me to have non-hashing versions (these versions exist for unicodeutil for terms already for similar purposes)
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1438242

        LUCENE-4690: Performance improvements and non-hashing versions of NumericUtils.*ToPrefixCoded

        Show
        Commit Tag Bot added a comment - [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1438242 LUCENE-4690 : Performance improvements and non-hashing versions of NumericUtils.*ToPrefixCoded
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1438253

        LUCENE-4690: Performance improvements a non-hashing versions of NumericUtils.*ToPrefixCoded

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1438253 LUCENE-4690 : Performance improvements a non-hashing versions of NumericUtils.*ToPrefixCoded
        Hide
        Uwe Schindler added a comment -

        Thanks, Yonik!

        Show
        Uwe Schindler added a comment - Thanks, Yonik!
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development