Details
-
Sub-task
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
To decrease job startup time we should implement worker pools.
Worker pools should start BSPTask JVM's based on the configured task capacity.
This should greatly improve cold-start time for jobs. However, this cost is quite low compared to the long-running Hama task.
The idea is from http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a (slide 4). Google Tenzing uses this, and I read about the gmail priority inbox jobs which also uses this task reuse.
This will be the start of a number of tasks that will profile and improve startup time of jobs and cluster. (Umbrella follows).