We found that certain Hadoop Map/Reduce settings that are set in site config files do not take effect in Hive jobs, because the Tez site configs do not contain the same settings.
In Yahoo's case, the problem was that, at the time, there was no mapping between MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART and TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION. There were situations where significant capacity on production clusters were being used up doing nothing, while waiting for slow tasks to complete. This would have been avoided, were the mappings in place.
Tez provides a DeprecatedKeys utility class, to help map MR settings to Tez settings. Hive should use this to ensure that the mappings are in sync.
(Note to self: YHIVE-883)