[SPARK-38271] PoissonSampler may output more rows than MaxRows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.3, 3.1.2, 3.2.1, 3.3.0
Fix Version/s: 3.3.0, 3.2.2
Component/s: SQL
Labels:
- correctness

Description

scala> val df = spark.range(0, 1000)
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> df.count
res0: Long = 1000

scala> df.sample(true, 0.999999, 10).count
res1: Long = 1004

Attachments

Issue Links

links to

[Github] Pull Request #35593 (zhengruifeng)

[Github] Pull Request #35593 (zhengruifeng)

Activity

People

Assignee:: Ruifeng Zheng

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Feb/22 08:03

Updated:: 17/Jul/22 05:38

Resolved:: 22/Feb/22 13:05