Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Replicated joins are a common way to improve performance when joining a large dataset with a small one. The smaller dataset is loaded into memory in the mapper/reducer tasks, and is then joined with the larger dataset as the large one is processed by the MapReduce job itself.