Details
-
Story
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
-
None
Description
This is a followup on https://issues.apache.org/jira/browse/SPARK-40211
`scaleUpFactor` and `initialNumPartition` config are not supported yet in pyspark rdd take API
(see https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L2799)
basically it hardcoded `scaleUpFactor` as 1 and `initialNumPartition` as 4, therefore pyspark rdd take API is inconsistent with scala API.
Anyone familiar with pyspark can help support this (referring to scala implementation)?
Attachments
Issue Links
- split from
-
SPARK-40211 Allow executeTake() / collectLimit's number of starting partitions to be customized
- Resolved
- links to