I have been trying to implement a storage function for Apache Pig that writes data in Trevni format. I found that the storage function was very slow when reading whole records.
I did some profiling (with Yourkit) and found that most of the CPU time was being spent in org.apache.trevni.InputBuffer$readString() (specifically in the String() method). I changed to java.nio.charset.CharsetDecoder.decode for deserialization and saw a big improvement. Changes are included in the patch.
|Field||Original Value||New Value|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Release Note||Optimized string de-serialization|
|Fix Version/s||1.7.4 [ 12323742 ]|
|Assignee||Joseph Adler [ jadler ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|