kevinwilfong requested code review of "
HIVE-2621 [jira] Allow multiple group bys with the same input data and spray keys to be run on the same reducer.".
The meaningful changes are all in how the plan is generated.
If the conf variable has been set, the subclauses are first grouped by their group by keys and distinct keys. To facilitate this I added a wrapper class to ExprNodeDesc which makes equals like the isSame method.
If the conf variable is not set, I create a single group of all the subqueries.
Then, provided certain conditions are met, e.g. the conf variable is set, there is a group by and there are aggregations, the skew conf variable hasn't been set, I create the new plan for each group, otherwise the old plan is produced.
To start I generate the common filter by 'or'ing the group's clauses' filters. This goes into a select operator, which goes into a new reduce operator. The reduce operator is like the typical 1 MR group by reduce operator, except that to generate the reduce values it loops over each of the group's subclauses' aggregations and the columns used in the where clauses.
This goes into a forward operator and for each subclause the forward operator has a child filter operator, if the subclause has a filter, and a group by operator. Each group by operator is followed by the operators which would normally follow it in a plan.
I added some unit tests.
I verified these unit tests and the old unit tests all passed.
I created a sample query which consisted of a multi-insert from a table with 1,000,000 rows, going into 6 tables, each of which's subclause consisted of a group by, and a count distinct, as well as some other aggregations and havings. The subclauses were constructed such that they could be grouped into two reducers using the new plan. I also ensured that the data was such that map aggregation was turned of early using the existing plan. I verified that this query saw a significant improvement in its CPU usage.
MANAGE HERALD DIFFERENTIAL RULES
WHY DID I GET THIS EMAIL?
Tip: use the X-Herald-Rules header to filter Herald messages in your client.