Description
Similar to HIVE-20032, but for groupByKey. The tricky part with groupByKey is we need to preserve the hashCode until the key gets partitioned (via the HashPartitioner) but after that we don't really need to preserve the hashCode. The groupByKey operator in Spark does require a hashCode since it puts everything in a map, but it can use a different hash-code than the one specified in HiveKey. The hashcode in HiveKey is only important for determining the partition the key should be assigned to.
The drawback is that computing the hashcode for each HiveKey might require more CPU resources, but we should profile it just in case.