Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.13.0
-
None
-
None
-
Patch
Description
Currently, `TupleWritable.readFields` will, for every field in the tuple, create a new Writable of that field type using reflection (`WritableFactories.newInstance`), through `TupleWritable.getWritable`, in order to deserialize that field. This burns up an unfortunate amount of CPU time.
I've got a patch for this that caches the writables to be reused (just as the TupleWritable itself is reused throughout hadoop). It appears to work, at least for our cases. I think it will break if you ever have heterogenous tuple types, but that seems like a bad idea, if not already proscribed in the documentation somewhere.