Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-604

Avoid expensive Writables.reloadWritableComparableCodes where possible

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.15.0
    • Core
    • None

    Description

      Every time `setConf` is called on TupleWritable, `Writables.reloadWritableComparableCodes(conf)` is called. Unfortunately, `SequenceFile$Reader.readValue` calls `setConf` every single time. This burns a regrettable amount of CPU time.

      Attached is a patch that prevents a given TupleWritable instance from reloading the code more than once, as well as a patch to cache (hashCode-wise) reading from the actual hadoop config, which has to run regexes and stuff. I can construe situations where this would break (somehow, you modify the configuration in between reading to two values), but nothing actually sane comes to mind.

      Attachments

        Activity

          People

            mkwhitacre Micah Whitacre
            stevenruppert Steven Ruppert
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: