Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Our CROSS implementation is very costly. Recently had a case where a user was doing a CROSS of 30million records against 3K records and it caused lot of disk error exceptions during the shuffle phase. We need to add support for a map side cross syntax
C = CROSS A, B using 'replicate';
The smaller table can be loaded in a list (hashmap in replicate join) and iterated through for each record in the bigger table. It should give a major performance boost and drastically reduce the resource usage.