Tsuyoshi OZAWA Please go ahead and take a crack at it if you like. I have not started on any work related to this jira as of now.
I will propose what I was thinking of as a possible option for a solution with respect to the current features of Tez. Feel free to change/suggest/tear apart the proposal/suggest improvements for core functionality in tez.
Background info: Tez supports concepts of vertices and edges. As part of each edge and vertex, there is possibility for a user to plug-in some user logic to affect different aspects of the run-time. Currently, there are some pieces of this implemented to support things such as dynamic sizing of the no. of reduce tasks to run. Via events, information of the outputs of a map stage can be sampled in the AM to determine how many reducers to run. Once this is decided, the user logic in the AM can then route the information of map outputs ( within DataMovementEvents) to the appropriate reducer to ensure that partitions are assigned correctly.
Today, a MapReduce job consists of a Map vertex connected to a Reduce vertex via a Shuffle edge. For the above, I was thinking along the lines of a Map vertex followed by a Combiner Vertex which is then connected to the Reduce Vertex. The edge between the Map and combiner vertex could also just be a shuffle.
Using a similar approach for reducer dynamism, the combiner vertex could use events generated by the framework to learn about the locations of where the Map tasks are running. Based on this, the user logic could then decide how many tasks to create for the combiner vertex ( For example, one per physical node or one per rack ) and also define the locality requirements. Note, the shuffle edge works by the map task generating an event publishing the location of the map output which is then passed to the next stage's input. Using this, there could be various optimizations done too. In some cases, the combiner vertex may decide to do no work and therefore pass the event generated by the map directly to the reduce without doing any work. This may require changes in the current shuffle input/output pairs though.
Tez is still sometime away before we can dynamically introduce new vertices into the DAG. At some point, the combiner vertex would be dynamically introduced by user-logic but at this time, it might be a good start to implement it via a static DAG with optimizations to bypass it as needed.
There is some reference information here: http://hortonworks.com/hadoop/tez/. ( We plan to create better docs and publish to the apache tez website soon ).