Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.1.0
-
None
Description
Current CartesianRDD implementation, suppose RDDA cartisian RDDB, generating RDDC,
each RDDA partition will be reading by multiple RDDC partition, and RDDB has similar problem.
This will cause, when RDDC partition computing, each partition's data in RDDA or RDDB will be repeatedly serialized (then transfer through network), if RDDA or RDDB haven't been persist, it will cause RDD recomputation repeatedly.