Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-604

Avoid expensive Writables.reloadWritableComparableCodes where possible

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.15.0
    • Component/s: Core
    • Labels:
      None

      Description

      Every time `setConf` is called on TupleWritable, `Writables.reloadWritableComparableCodes(conf)` is called. Unfortunately, `SequenceFile$Reader.readValue` calls `setConf` every single time. This burns a regrettable amount of CPU time.

      Attached is a patch that prevents a given TupleWritable instance from reloading the code more than once, as well as a patch to cache (hashCode-wise) reading from the actual hadoop config, which has to run regexes and stuff. I can construe situations where this would break (somehow, you modify the configuration in between reading to two values), but nothing actually sane comes to mind.

        Attachments

          Activity

            People

            • Assignee:
              mkwhitacre Micah Whitacre
              Reporter:
              stevenruppert Steven Ruppert
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: