Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-3

Replicated ("map-side") joins

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.3.0
    • MapReduce Patterns
    • None

    Description

      Replicated joins are a common way to improve performance when joining a large dataset with a small one. The smaller dataset is loaded into memory in the mapper/reducer tasks, and is then joined with the larger dataset as the large one is processed by the MapReduce job itself.

      Attachments

        1. mapside-joins.patch
          29 kB
          Gabriel Reid

        Activity

          People

            gabriel.reid Gabriel Reid
            jwills Josh Wills
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: