I think that Spark SQL may be returning incorrect answers for queries that use multiple window specifications within the same expression. Here's an example that illustrates the problem.
Say that I have a table with a single numeric column and that I want to compute a cumulative distribution function over this column. Let's call this table nums:
It's easy to compute a running sum over this column:
It's also easy to compute a total sum over all rows:
Let's say that I combine these expressions to compute a CDF:
This seems wrong. Note that if we combine the running total, global total, and combined expression in the same query, then we see that the first two values are computed correctly / but the combined expression seems to be incorrect:
/cc Yin Huai