Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26566

Optimize encodeNumeric in OrderedBytes

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0, 3.0.0-alpha-3
    • Performance
    • None
    • Reviewed

    Description

      Found current estimate E step in OrderedBytes is to search step by step in loops. We can directly calculate the length to move the point and let the time complexity become to O(1).
      From the comments of the method encodeNumericLarge:

      encodeNumericLarge

      Encode the large magnitude floating point number val using the key encoding. The caller guarantees that val will be
      finite and abs(val) >= 1.0.
      A floating point value is encoded as an integer exponent E and a mantissa M. The original value is equal to (M * 100^E). E is set to the smallest value possible without making M greater than or equal to 1.0.
      Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is X (hence X>=0 and X<=99) then the byte value will be 2*X+1 for every byte of the mantissa, except for the last byte which will be 2*X+0. The mantissa must be the minimum number of bytes necessary to represent the value; trailing X==0 digits areĀ omitted. This means that the mantissa will never contain a byte with the value 0x00.
      The function could be divided into 4 steps:

      1. Confirm the sign (Negative or not)
      2. Normalise the abs(val), transformed to (M * 100^E)
      3. Encode M by peeling off centimal digit and
      4. Encode X

      At the step 2, we normalise abs(val) and determine the exponent E.
      The current implementation is :

          while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); e +=16; }
          while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); e+= 4; }
          while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = abs.movePointLeft(2); e++; }
      

      Which just move the point of abs(val), (which is in form yyyy.zzzz, where y & z are digits, e.g. 1000.0001) from right to left step by step, until abs(val) < 1.
      In the loop body, the increment of e is always the half of steps the point moved to left. (step=32, deltaE=16), (step=8, deltaE=4), (step=2, deltaE=1)
      As e starts from 0, the value of e must be the half of the total moved steps at last.
      So that we could save the above three loops and calculate the moved length and e directly.
      My implementation:

          // This is the number of digits of the Integer part of abs(val) when val > 10
          int integerDigits = abs.precision() - abs.scale();
          // If length of the Integer part is odd, from the last loop above, we actually moved one more step forward.
          int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : integerDigits + 1;
          e = lengthToMoveLeft / 2;
          // The edge condition.
          if (e > 350) {
            e = 351;
            lengthToMoveLeft = 702;
          }
          abs = abs.movePointLeft(lengthToMoveLeft);
      

      In this case, we only move the point once. The time complexity is O(1).
      Same idea to the method of encodeNumericSmall.
      From the JMH test, (JmhBenchmark.java & Benchmark.log in attachments), the throughput has an incredible improvement.

      At step 3 peeling off M into centimals. Current implementation is:

      for (int i = 0; i < 18 && abs.compareTo(BigDecimal.ZERO) != 0; i++) {
            abs = abs.movePointRight(2); // Will create new BigDecimal object.
            d = abs.intValue();
            dst.put((byte) (2 * d + 1));
            abs = abs.subtract(BigDecimal.valueOf(d)); // Will create new BigDecimal object.
      }
      

      Also move the point of big decimal step by step. BigDecimal operations are costly and create plenty of new BigDecimal objects, which are only used once. Here I tried to directly encoding the BigDecimal based on the string to avoid BigDecimal operations.
      My implementation:

      private static void encodeToCentimal(PositionedByteRange dst, BigDecimal val) {
          String stringOfAbs = val.stripTrailingZeros().toPlainString();
          String value = stringOfAbs.substring(stringOfAbs.indexOf('.') + 1);
          int d;
      
          int maxPrecision = Math.min(MAX_NUM_ENCODE_BYTES * 2, value.length());
          for (int i = 0; i < maxPrecision; i += 2) {
            d = (value.charAt(i) - '0') * 10;
            if (i + 1 < maxPrecision) {
              d += (value.charAt(i + 1) - '0');
            }
            dst.put((byte) (2 * d + 1));
          }
        }
      

      From the JMH test, (EncodingBenchmark.java & Benchmark-encoding.log) this raises the throughtput of the encoding about 200%.

      Attachments

        1. Benchmark.log
          4 kB
          Yutong Xiao
        2. Benchmark-encoding.log
          4 kB
          Yutong Xiao
        3. EncodingBenchmark.java
          2 kB
          Yutong Xiao
        4. JmhBenchmark.java
          2 kB
          Yutong Xiao

        Issue Links

          Activity

            People

              xytss123 Yutong Xiao
              xytss123 Yutong Xiao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: