Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25585

Allow users to specify scale of result in Decimal arithmetic

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      The current behavior of Spark Decimal during arithmetic makes it difficult for users to achieve their desired level of precision. Numeric literals are automatically cast to unlimited precision during arithmetic, but the final result is cast down depending on the precision and scale of the operands, according to MS SQL rules (discussed in other JIRA's). This final cast can cause substantial loss of scale.

      For example:

      scala> spark.sql("select 1.3/3.41").show(false)
      
      +--------------------------------------------------------+
      
      |(CAST(1.3 AS DECIMAL(3,2)) / CAST(3.41 AS DECIMAL(3,2)))|
      
      +--------------------------------------------------------+
      
      |0.381232                                                |
      
      +--------------------------------------------------------+

      To get higher scale in the result, a user must cast the operands to higher scale:

      scala> spark.sql("select cast(1.3 as decimal(5,4))/cast(3.41 as decimal(5,4))").show(false)
      
      +--------------------------------------------------------+
      
      |(CAST(1.3 AS DECIMAL(5,4)) / CAST(3.41 AS DECIMAL(5,4)))|
      
      +--------------------------------------------------------+
      
      |0.3812316716                                            |
      
      +--------------------------------------------------------+
      
      scala> spark.sql("select cast(1.3 as decimal(10,9))/cast(3.41 as decimal(10,9))").show(false)
      
      +----------------------------------------------------------+
      
      |(CAST(1.3 AS DECIMAL(10,9)) / CAST(3.41 AS DECIMAL(10,9)))|
      
      +----------------------------------------------------------+
      
      |0.38123167155425219941                                    |
      
      +----------------------------------------------------------+

      But if the user casts too high, the result's scale decreases. 

      scala> spark.sql("select cast(1.3 as decimal(25,24))/cast(3.41 as decimal(25,24))").show(false)
      
      +------------------------------------------------------------+
      
      |(CAST(1.3 AS DECIMAL(25,24)) / CAST(3.41 AS DECIMAL(25,24)))|
      
      +------------------------------------------------------------+
      
      |0.3812316715543                                             |
      
      +------------------------------------------------------------+

      Thus, the user has no way of knowing how to cast to get the scale he wants. This problem is even harder to deal with when using variables instead of literals. 

      The user should be able to explicitly set the desired scale of the result. MySQL offers this capability in the form of a system variable called "div_precision_increment."

      From the MySQL docs: "In division performed with /, the scale of the result when using two exact-value operands is the scale of the first operand plus the value of the div_precision_increment system variable (which is 4 by default). For example, the result of the expression 5.05 / 0.014 has a scale of six decimal places (360.714286)."

      mysql> SELECT 1/7;
      +--------+
      | 1/7    |
      +--------+
      | 0.1429 |
      +--------+
      mysql> SET div_precision_increment = 12;
      mysql> SELECT 1/7;
      +----------------+
      | 1/7            |
      +----------------+
      | 0.142857142857 |
      +----------------+

      This gives the user full control of the result's scale after arithmetic and obviates the need for casting all over the place.

      Since Spark 2.3, we already have DecimalType.MINIMUM_ADJUSTED_SCALE, which is similar to div_precision_increment. It just needs to be made modifiable by the user. 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            bkestelman Benito Kestelman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: