Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.6.0
-
None
-
None
-
None
Description
Currently if a join followed by a limit operator, the reducer still need to do a lot of work even after the limit is reached.
A plan could look like:
ExecReducer -> ExtractOperator -> Limit Operator -> ...
In Hadoop 0.20, we can overwrite the reduce API to stop taking rows from the underlying file, but for pre-0.20, it is not overwritable. What we can do is to put the limit number in the ExecReducer metadata in the hive optimization phase.