Details
Description
There is currently no facility in spark.sql.functions to allow computation of the product of all numbers in a grouping expression. Such a facility would likely be useful when computing statistical quantities such as the combined probability of a set of independent events, or in financial applications when calculating a cumulative interest rate.
Although it is certainly possible to emulate this by an expression of the form exp(sum(log(column))), this has a number of significant drawbacks:
- It involves computationally costly functions (exp, log)
- It is more verbose than something like product(column)
- It is more prone to numerical inaccuracies when handling quantities that are close to one than by directly multiplying a set of numbers
- It will not handle zeros or negative numbers cleanly
I am currently developing an addition to sql.functions, which involvesĀ a new Catalyst aggregation expression. This needs some additional testing, and I hope to issue a pull-request soon.