A few thoughts on this...
The idea here is to implement a limited version of the limit clause as it appears in mysql. I am not planning to implement the offset part of it. Basically I want to support
SELECT .... LIMIT N
where N is the number of rows to be returned from the query block (note this is only for the selects in the query block).
While generating the plan for the query, once the plan for the query block has been generated I can add the plan fragment
LimitMap -> ReduceSink -> LimitReduce
So for example if the plan of the query block is something like...
opX1 -> opX2 .... -> ReduceSink -> reduce op -> opY1 -> opY2 ...
This would look like
opX1 -> opX2 ... -> ReduceSink -> reduce op -> opY1 -> opY2 ... -> LimitMapOp -> ReduceSink -> LimitReduceOp
This should also work seemlessly with plans that do not have a ReduceSink ie. plans that look like
opX1 -> opX2 ... -> opXn
will look like
opX1 -> opX2 ... -> opXn -> LimitMap -> ReduceSink -> LinkReduce
Suppose we are calculating limit N the LimitMap will pass through N rows from each mapper and the LinkReduce will return N rows out of the ones it receives from the mappers. We have to run this map/reduce job with 1 reducer.