While that is true, we need all 3 trunks (common, hdfs, and mapreduce) to depend on the current SNAPSHOT of each other. Otherwise, a change a common will break all of them. Using a fixed version of mapreduce for HDFS testing wouldn't work because that fixed version would become broken. In short, we would need to depend on a specific SNAPSHOT version of mapreduce. When would that be updated? Who would update it?
Ah.. I didn't realize that. That definitely is a problem with letting HDFS using a particular version of mapred. Thanks, owen, for making this clear!
The majority of the move is about the tools and benchmarks using HDFS and MapReduce that are better served being in MapReduce. Some of the the tests should be recoded without MapReduce and pushed back into HDFS.
I welcome this.
In general, I can share your concerns w.r.t cyclic dependencies. Just that I felt this might not be the right fix. Even that, I myself don't pretend to have any idea of the right solution for this.
One idea I can get from looking at the cyclic dependencies is that the sore points are the MiniMR and MiniDFS clusters. Leaving these behind, I guess mapred and hdfs are more or less completely independent. So one solution could be to create MR and DFS cluster interfaces and let mapred/hdfs code to these interfaces? Thoughts?
By the way, I am OK for not reverting this patch for now as it atleast addresses the cyclic dependency problem in one particular way. But we should definitely pursue other solutions, if any, perhaps on a different JIRA issue.