Affects Version/s: None
Fix Version/s: None
There are syntactic elements MAP and REDUCE which function as syntactic sugar for SELECT TRANSFORM. This behavior is not at all intuitive, because no checking or verification is done to ensure that the user's intention is met.
Specifically, Hive may see a MAP query and simply tack the transform script on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), or (more dangerously) vice-versa. Given that Hive's whole point is to sit on top of a mapreduce framework and allow transformations in the mapper or reducer, it seems very inappropriate for Hive to ignore a clear command from the user to MAP or to REDUCE the data using a script, and then simply ignore it.
Better behavior would be for hive to see a MAP command and to start a new mapreduce step and run the command in the mapper (even if it otherwise would be run in the reducer), and for REDUCE to begin a reduce step if necessary (so, tack the REDUCE script on to the end of a REDUCE job if the current system would do so, or if not, treat the 0th column as the reduce key, throw a warning saying this has been done, and force a reduce job).
Acceptable behavior would be to throw an error or warning when the user's clearly-stated desire is going to be ignored. "Warning: User used MAP keyword, but transformation will occur in the reduce phase" / "Warning: User used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. Transformation will occur in the map phase."
|Field||Original Value||New Value|
|Summary||Make MAP and REDUCE work as expected or add warnings||Deprecate, remove, or fix MAP and REDUCE syntax.|
|Component/s||SQL [ 12315100 ]|
|Issue Type||Improvement [ 4 ]||Bug [ 1 ]|
|Priority||Major [ 3 ]||Minor [ 4 ]|