Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Cartesian#cross currently uses the PTable#cogroup method to join two sets of data together; this results in all data from both sides of the join being loaded in memory at one time. This can be a real problem with cartesian joins because of the quantity of data being joined.
Using PTable#join instead of PTable#cogroup will reduce the memory usage by 50%, which can be the difference between a cartesian join working or failing with an OOME.