Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0, 3.5.1, 3.4.3
Description
the output dataframe of `sample` is not immutable in Spark Connect
In Spark Classic:
In [1]: df = spark.range(10000).sample(0.1)
In [2]: [df.count() for i in range(10)]
Out[2]: [1006, 1006, 1006, 1006, 1006, 1006, 1006, 1006, 1006, 1006]
In Spark Connect:
In [1]: df = spark.range(10000).sample(0.1)
In [2]: [df.count() for i in range(10)]
Out[2]: [969, 1005, 958, 996, 987, 1026, 991, 1020, 1012, 979]
Attachments
Issue Links
- links to