[SPARK-45216] Fix non-deterministic seeded Dataset APIs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: Connect, SQL
Labels:
- pull-request-available

Description

If we run the following example the result is the expected equal 2 columns:

val c = rand()
df.select(c, c)

+--------------------------+--------------------------+
|rand(-4522010140232537566)|rand(-4522010140232537566)|
+--------------------------+--------------------------+
|        0.4520819282997137|        0.4520819282997137|
+--------------------------+--------------------------+

But if we run use other similar APIs their result is incorrect:

val r1 = random()
val r2 = uuid()
val r3 = shuffle(col("x"))
val x = df.select(r1, r1, r2, r2, r3, r3)

+------------------+------------------+--------------------+--------------------+----------+----------+
|            rand()|            rand()|              uuid()|              uuid()|shuffle(x)|shuffle(x)|
+------------------+------------------+--------------------+--------------------+----------+----------+
|0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...| [1, 2, 3]| [2, 1, 3]|
+------------------+------------------+--------------------+--------------------+----------+----------+

Attachments

Issue Links

links to

GitHub Pull Request #42997

Activity

People

Assignee:: Peter Toth

Reporter:: Peter Toth

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Sep/23 12:00

Updated:: 25/Nov/23 00:56

Resolved:: 21/Sep/23 00:36