Hive
  1. Hive
  2. HIVE-4160 Vectorized Query Execution in Hive
  3. HIVE-6017

Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Contribute the Decimal128 high-performance decimal package developed by Microsoft to Hive. This was originally written for Microsoft PolyBase by Hideaki Kimura.

      This code is about 8X more efficient than Java BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also "mutable" so you can change the contents of an existing object. This helps reduce the cost of new() and garbage collection.

      1. HIVE-6017.04.patch
        221 kB
        Eric Hanson
      2. HIVE-6017.03.patch
        222 kB
        Eric Hanson
      3. HIVE-6017.02.patch
        235 kB
        Eric Hanson
      4. HIVE-6017.01.patch
        235 kB
        Eric Hanson

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open In Progress In Progress
          1d 15h 30m 1 Eric Hanson 13/Dec/13 17:25
          In Progress In Progress Patch Available Patch Available
          3d 6h 29m 1 Eric Hanson 16/Dec/13 23:55
          Patch Available Patch Available Resolved Resolved
          16d 23h 26m 1 Eric Hanson 02/Jan/14 23:21
          Hide
          Michael Howard added a comment -

          These are observations about the problems with "native" multiplication and division in Decimal128.

          Hive 0.13 source code is visible at:
          http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/0.13.0/org/apache/hadoop/hive/common/type/Decimal128.java

          The "native" Decimal128 multiply and divide code is not in use.

          Instead, there is a comment which says:
          > As of 2/11/2014 this has a known bug in multiplication. ...
          > The fix employed now is to replace this code with code that uses Java BigDecimal multiply.

          The fundamental problem with the multiply code is that UnsignedInt128.multiplyArrays4And4To4NoOverflow() is broken.

          There are multiple stages where the intermediate/temp variable 'product' is calculated.
          Unfortunately, carry information is being lost (out the top) whenever the results of two or more multiplications are being summed together.

          It is perfectly fine to say:
          uint64 = uint32 * uint32
          or
          uint64 = uint32 * uint32 + carry

          However, the following is invalid because it will lose (carry) information
          uint64 = (uint32 * uint32) + (uint32 * uint32)

          an example case is:
          0xFFFFFFFFL * 0xFFFFFFFFL + 0xFFFFFFFFL * 0xFFFFFFFFL

          This sum cannot be held in a uint64 ... a carry out the top is lost.

          This is easier to envision if you do the same operation with 8=>16 bits instead of 32=>64 bits
          0xFF * 0xFF = 0xFE01 // ok
          0xFF * 0xFF + 0xFF = 0xFE01 + 0xFF = 0xFEFF // ok
          0xFF * 0xFF + 0xFF * 0xFF = 0xFE01 + 0xFE01 = 0x1FC02 // not ok ... overflows 16 bits ... carry lost

          The UnsignedInt128.multiplyArrays4And4To4NoOverflow() and UnsignedInt128.multiplyArrays4And4To8() methods use this pattern repeatedly, and therefore are broken.

          These routines need to be implemented in such a way that the right-hand-side contains no more than a single multiply (uint32 * uint32) and a single addition (+ uint32)

          This is fine:
          1899 product = (right[0] & SqlMathUtil.LONG_MASK)
          1900 * (left[0] & SqlMathUtil.LONG_MASK);
          1901 int z0 = (int) product;

          This is not ... the sum will overflow and 'product' will have lost the carry
          1903 product = (right[0] & SqlMathUtil.LONG_MASK)
          1904 * (left[1] & SqlMathUtil.LONG_MASK)
          1905 + (right[1] & SqlMathUtil.LONG_MASK)
          1906 * (left[0] & SqlMathUtil.LONG_MASK) + (product >>> 32);
          1907 int z1 = (int) product;

          This code needs to be rewritten as a sum of terms where each term is a uint32[4]*uint32 which is left shifted.

          UnsignedInt128.multiplyDestructive(int) and UnsignedInt128.addDestructive(int[]) are just fine. SqlMathUtil.multiplyMultiprecision(int[], int) is also fine.

          UnsignedInt128.multiplyArrays4And4To4NoOverflow() and UnsignedInt128.multiplyArrays4And4To8() need to be implemented using these as building blocks.

          The goal of this was to provide "high-performance". The potential benefit comes from fact that these objects are mutable and it avoids the object creation/GC associated with BigDecimal. Therefore, it is a little painful to see:

          1173 public void More ...multiplyDestructive(Decimal128 right, short newScale) {
          1174 HiveDecimal rightHD = HiveDecimal.create(right.toBigDecimal());
          1175 HiveDecimal thisHD = HiveDecimal.create(this.toBigDecimal());
          1176 HiveDecimal result = thisHD.multiply(rightHD);
          ...
          1185 this.update(result.bigDecimalValue().toPlainString(), newScale);

          Enough on multiplication. Regarding division ...

          1240 public void More ...divideDestructiveNativeDecimal128

          has a comment:
          As of 1/20/2014 this has a known bug in division. See TestDecimal128.testKnownPriorErrors().

          TestDecimal128.testKnownPriorErrors uses both Decimal128.multiplication and Decimal128.division. We know that there is a problem with multiplication. Unclear whether or not there is a problem with Decimal128.division.

          I have not investigated Decimal128.division in detail.
          Based upon a quick glance, I did not see any fundamental problems with SqlMathUtil.divideMultiPrecision(). So the fundamental underpinnings might be OK.

          However, I did not see any reference to RoundingMode behavior in Decimal128.divideDestructiveNativeDecimal128().
          Decimal128 should produce the same results as BigDecimal ... that is at least desired, and probably required.
          In order to accomplish this Decimal128 will need to deal with RoundingModes ... or at least round consistently with RoundingMode.HALF_UP.

          Show
          Michael Howard added a comment - These are observations about the problems with "native" multiplication and division in Decimal128. Hive 0.13 source code is visible at: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/0.13.0/org/apache/hadoop/hive/common/type/Decimal128.java The "native" Decimal128 multiply and divide code is not in use. Instead, there is a comment which says: > As of 2/11/2014 this has a known bug in multiplication. ... > The fix employed now is to replace this code with code that uses Java BigDecimal multiply. The fundamental problem with the multiply code is that UnsignedInt128.multiplyArrays4And4To4NoOverflow() is broken. There are multiple stages where the intermediate/temp variable 'product' is calculated. Unfortunately, carry information is being lost (out the top) whenever the results of two or more multiplications are being summed together. It is perfectly fine to say: uint64 = uint32 * uint32 or uint64 = uint32 * uint32 + carry However, the following is invalid because it will lose (carry) information uint64 = (uint32 * uint32) + (uint32 * uint32) an example case is: 0xFFFFFFFFL * 0xFFFFFFFFL + 0xFFFFFFFFL * 0xFFFFFFFFL This sum cannot be held in a uint64 ... a carry out the top is lost. This is easier to envision if you do the same operation with 8=>16 bits instead of 32=>64 bits 0xFF * 0xFF = 0xFE01 // ok 0xFF * 0xFF + 0xFF = 0xFE01 + 0xFF = 0xFEFF // ok 0xFF * 0xFF + 0xFF * 0xFF = 0xFE01 + 0xFE01 = 0x1FC02 // not ok ... overflows 16 bits ... carry lost The UnsignedInt128.multiplyArrays4And4To4NoOverflow() and UnsignedInt128.multiplyArrays4And4To8() methods use this pattern repeatedly, and therefore are broken. These routines need to be implemented in such a way that the right-hand-side contains no more than a single multiply (uint32 * uint32) and a single addition (+ uint32) This is fine: 1899 product = (right [0] & SqlMathUtil.LONG_MASK) 1900 * (left [0] & SqlMathUtil.LONG_MASK); 1901 int z0 = (int) product; This is not ... the sum will overflow and 'product' will have lost the carry 1903 product = (right [0] & SqlMathUtil.LONG_MASK) 1904 * (left [1] & SqlMathUtil.LONG_MASK) 1905 + (right [1] & SqlMathUtil.LONG_MASK) 1906 * (left [0] & SqlMathUtil.LONG_MASK) + (product >>> 32); 1907 int z1 = (int) product; This code needs to be rewritten as a sum of terms where each term is a uint32 [4] *uint32 which is left shifted. UnsignedInt128.multiplyDestructive(int) and UnsignedInt128.addDestructive(int[]) are just fine. SqlMathUtil.multiplyMultiprecision(int[], int) is also fine. UnsignedInt128.multiplyArrays4And4To4NoOverflow() and UnsignedInt128.multiplyArrays4And4To8() need to be implemented using these as building blocks. The goal of this was to provide "high-performance". The potential benefit comes from fact that these objects are mutable and it avoids the object creation/GC associated with BigDecimal. Therefore, it is a little painful to see: 1173 public void More ...multiplyDestructive(Decimal128 right, short newScale) { 1174 HiveDecimal rightHD = HiveDecimal.create(right.toBigDecimal()); 1175 HiveDecimal thisHD = HiveDecimal.create(this.toBigDecimal()); 1176 HiveDecimal result = thisHD.multiply(rightHD); ... 1185 this.update(result.bigDecimalValue().toPlainString(), newScale); Enough on multiplication. Regarding division ... 1240 public void More ...divideDestructiveNativeDecimal128 has a comment: As of 1/20/2014 this has a known bug in division. See TestDecimal128.testKnownPriorErrors(). TestDecimal128.testKnownPriorErrors uses both Decimal128.multiplication and Decimal128.division. We know that there is a problem with multiplication. Unclear whether or not there is a problem with Decimal128.division. I have not investigated Decimal128.division in detail. Based upon a quick glance, I did not see any fundamental problems with SqlMathUtil.divideMultiPrecision(). So the fundamental underpinnings might be OK. However, I did not see any reference to RoundingMode behavior in Decimal128.divideDestructiveNativeDecimal128(). Decimal128 should produce the same results as BigDecimal ... that is at least desired, and probably required. In order to accomplish this Decimal128 will need to deal with RoundingModes ... or at least round consistently with RoundingMode.HALF_UP.
          Hide
          Lefty Leverenz added a comment -

          Okay, thanks Eric.

          Show
          Lefty Leverenz added a comment - Okay, thanks Eric.
          Hide
          Eric Hanson added a comment -

          I don't think this needs end-user documentation. This is an internal performance enhancement. The user-visible type system won't change.

          Show
          Eric Hanson added a comment - I don't think this needs end-user documentation. This is an internal performance enhancement. The user-visible type system won't change.
          Hide
          Lefty Leverenz added a comment -

          What documentation does this need – just add DECIMAL128 to the list of types with a brief explanation?

          Show
          Lefty Leverenz added a comment - What documentation does this need – just add DECIMAL128 to the list of types with a brief explanation? Numeric Types Decimals Floating Point Types
          Eric Hanson made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.13.0 [ 12324986 ]
          Resolution Fixed [ 1 ]
          Hide
          Eric Hanson added a comment -

          Committed to trunk. Thank you Microsoft and Hideaki!

          Show
          Eric Hanson added a comment - Committed to trunk. Thank you Microsoft and Hideaki!
          Hide
          Eric Hanson added a comment -

          I checked up on this. Microsoft has signed the Apache Corporate Contributor License Agreement (CCLA) and the people who wrote the code signed the Apache ICLA. These agreements plus the copyright notice in the code cover the situation.

          Show
          Eric Hanson added a comment - I checked up on this. Microsoft has signed the Apache Corporate Contributor License Agreement (CCLA) and the people who wrote the code signed the Apache ICLA. These agreements plus the copyright notice in the code cover the situation.
          Hide
          Jitendra Nath Pandey added a comment -

          The code looks good to me. +1
          It seems the copywrite needs to be mentioned in the NOTICE file as well, although I am not an expert on these rules. Please also refer to http://www.apache.org/licenses/ to comply with the guidelines when submitting code with employer copywrite or third-party code. Does it require a Software Grant Agreement (SGA) with PMC?

          Show
          Jitendra Nath Pandey added a comment - The code looks good to me. +1 It seems the copywrite needs to be mentioned in the NOTICE file as well, although I am not an expert on these rules. Please also refer to http://www.apache.org/licenses/ to comply with the guidelines when submitting code with employer copywrite or third-party code. Does it require a Software Grant Agreement (SGA) with PMC?
          Eric Hanson made changes -
          Link This issue is depended upon by HIVE-6051 [ HIVE-6051 ]
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12619021/HIVE-6017.04.patch

          SUCCESS: +1 4840 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/662/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/662/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12619021

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619021/HIVE-6017.04.patch SUCCESS: +1 4840 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/662/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/662/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12619021
          Eric Hanson made changes -
          Attachment HIVE-6017.04.patch [ 12619021 ]
          Hide
          Eric Hanson added a comment -

          remove trailing white space

          Show
          Eric Hanson added a comment - remove trailing white space
          Eric Hanson made changes -
          Attachment HIVE-6017.03.patch [ 12619017 ]
          Eric Hanson made changes -
          Attachment HIVE-6017.02.patch [ 12619011 ]
          Hide
          Eric Hanson added a comment -

          Renamed new test classes to start with Test rather than end with Test per instructions from Brock Noland.

          Show
          Eric Hanson added a comment - Renamed new test classes to start with Test rather than end with Test per instructions from Brock Noland.
          Hide
          Eric Hanson added a comment -

          Code review available at https://reviews.apache.org/r/16307/

          Show
          Eric Hanson added a comment - Code review available at https://reviews.apache.org/r/16307/
          Eric Hanson made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Affects Version/s 0.13.0 [ 12324986 ]
          Hide
          Eric Hanson added a comment -

          Added Decimal128 package into Hive package org.apache.hadoop.hive.common.type. Verified that it compiles and unit tests pass.

          Show
          Eric Hanson added a comment - Added Decimal128 package into Hive package org.apache.hadoop.hive.common.type. Verified that it compiles and unit tests pass.
          Eric Hanson made changes -
          Attachment HIVE-6017.01.patch [ 12618997 ]
          Eric Hanson made changes -
          Link This issue is depended upon by HIVE-5762 [ HIVE-5762 ]
          Eric Hanson made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          Eric Hanson created issue -

            People

            • Assignee:
              Eric Hanson
              Reporter:
              Eric Hanson
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development