Hi, this a comparatively dirty hack that I made over the current source code. I would like someone to review this, especially because I have changed few things that were assumed to be multi-threaded as single-threaded.
While working on it, I realized that this won't necessarily improve the performance, because the resource requirements for Hama is different from Hadoop. This change would move the mapper tasks closer to the input as in Hadoop. But in case of Hama tasks continue running on that machine throughout its lifetime. If in search of data-locality, the tasks get scheduled such that the communication between the nodes are costlier than normal (e.g. tasks resident in separate racks), then this change would degrade the performance.
While discussing on the issue, Thomas and me felt that network topology information should be more important for scheduling jobs than data locality for the first superstep. We felt that
HAMA-519 could be a good start for providing input for this. I see that this is already scheduled for 0.6. I can provide the test-cases if we decide to push this in 0.5 release.
From the patch, I would like to know if making a single TaskWorker schedule all tasks is fine or not. This would be important in my future patches. So even if this patch is not really important, I would appreciate if it is reviewed.