Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
Hadoop job, map fetches data from external systems
Description
Consider hadoop jobs where maps fetch data from external systems, and emit the data. The reducers in this are identity reducers. The data processed by these jobs is huge. There could be slow nodes in this cluster and some of the reducers run twice as slow as their counterparts. This could result in a long tail. Speculative execution would help greatly in such cases. However given the current hadoop, we have to select speculative execution for both maps and reducers. In this case hurting the map performance as they are fetching data from external systems thereby overloading the external systems.
Speculative execution only on reducers would be a great way to solve this problem.