Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-241

Write side outputs from the Mapper stage of a MapReduce job

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • Core
    • None

    Description

      Right now, Crunch always writes output files from the "last" stage of whatever kind of job it runs: either the reduce-side of a MapReduce job, or the map-side of a map-only job. This often leads to situations where we have to re-process the same input twice, once for the map-side outputs and again for the reduce-side outputs.

      This change adds the ability for Crunch to write side outputs from the mapper phase of a MapReduce job (i.e., we can write output Targets from both the map side and the reduce side.) This should help lots of pipelines that implement these types of writes execute much faster.

      Attachments

        1. CRUNCH-241.patch
          18 kB
          Josh Wills

        Activity

          People

            jwills Josh Wills
            jwills Josh Wills
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: