Details
Description
Summary to reproduce bug:
- Create a DataFrame DF, and sample it with a fixed seed.
- Collect that DataFrame -> result1
- Call a particular UDF on that DataFrame -> result2
You would expect results 1 and 2 to use the same rows from DF, but they appear not to.
Note: result1 and result2 are both deterministic.
See the attached notebook for details. Cells in the notebook were executed in order.
Attachments
Attachments
Issue Links
- is duplicated by
-
SPARK-15382 monotonicallyIncreasingId doesn't work when data is upsampled
- Closed
- links to