Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
-
None
Description
The wikipedia ingest Map job uses a derivative of the FileInputFormat, which launches one job per file. Given the partitioning strategy and workload distribution, it makes sense to launch multiple mappers per file. Each mapper can then take a chunk of the articles in the file using the same partitioning strategy as the assignment of row IDs.