In HoS we use currently use operator stats to determine reducer parallelism. However, it is often the case that operator stats are not accurate, especially if column stats are not available. This sometimes will generate extremely poor reducer parallelism, and cause HoS query to run forever.
This JIRA tries to offer an alternative way to compute reducer parallelism, similar to how MR does. Here's the approach we are suggesting:
1. when computing the parallelism for a MapWork, use stats associated with the TableScan operator;
2. when computing the parallelism for a ReduceWork, use the maximum parallelism from all its parents.