Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33678

Numerical product aggregation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.7, 3.0.0, 3.1.0
    • 3.2.0
    • SQL
    • None
    • Patch

    Description

      There is currently no facility in spark.sql.functions to allow computation of the product of all numbers in a grouping expression. Such a facility would likely be useful when computing statistical quantities such as the combined probability of a set of independent events, or in financial applications when calculating a cumulative interest rate.

      Although it is certainly possible to emulate this by an expression of the form exp(sum(log(column))), this has a number of significant drawbacks:

      • It involves computationally costly functions (exp, log)
      • It is more verbose than something like product(column)
      • It is more prone to numerical inaccuracies when handling quantities that are close to one than by directly multiplying a set of numbers
      • It will not handle zeros or negative numbers cleanly

      I am currently developing an addition to sql.functions, which involvesĀ a new Catalyst aggregation expression. This needs some additional testing, and I hope to issue a pull-request soon.

      Attachments

        Activity

          People

            rwpenney Richard Penney
            rwpenney Richard Penney
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: