Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-603

Cache constituent Writables inside TupleWritable `readField` call

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None
    • Flags:
      Patch

      Description

      Currently, `TupleWritable.readFields` will, for every field in the tuple, create a new Writable of that field type using reflection (`WritableFactories.newInstance`), through `TupleWritable.getWritable`, in order to deserialize that field. This burns up an unfortunate amount of CPU time.

      I've got a patch for this that caches the writables to be reused (just as the TupleWritable itself is reused throughout hadoop). It appears to work, at least for our cases. I think it will break if you ever have heterogenous tuple types, but that seems like a bad idea, if not already proscribed in the documentation somewhere.

        Attachments

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              stevenruppert Steven Ruppert
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: