[CRUNCH-603] Cache constituent Writables inside TupleWritable `readField` call - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 0.13.0
Fix Version/s: None
Component/s: Core
Labels:
None

Flags:

Patch

Description

Currently, `TupleWritable.readFields` will, for every field in the tuple, create a new Writable of that field type using reflection (`WritableFactories.newInstance`), through `TupleWritable.getWritable`, in order to deserialize that field. This burns up an unfortunate amount of CPU time.

I've got a patch for this that caches the writables to be reused (just as the TupleWritable itself is reused throughout hadoop). It appears to work, at least for our cases. I think it will break if you ever have heterogenous tuple types, but that seems like a bad idea, if not already proscribed in the documentation somewhere.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-TupleWritable-reuse-Writable-instances-where-possibl.patch
18/Apr/16 19:34
12 kB
Steven Ruppert

Activity

People

Assignee:: Josh Wills

Reporter:: Steven Ruppert

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Apr/16 19:30

Updated:: 18/Apr/16 19:34