Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.7.0, 0.7.1, 0.8.1
-
None
Description
The TextConverter in PrimitiveObjectInspectorConverter.java is very inefficient if the input object is already Text or Lazy. Since it calls getPrimitiveJavaObject, each Text is decoded into a String and then re-encoded into Text. The solution is to check if preferWritable() is true, then call getPrimitiveWritable(input).
To test performance, I ran the Grep query from https://issues.apache.org/jira/browse/HIVE-396 on a cluster of 3 ec2 large nodes (2 slaves 1 master) on 6GB of data. It took 21 map tasks. With the current 0.8.1 version, it took 81 seconds. After patching, it took 66 seconds.
I will attach a patch and testcases.