Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-128

Allow one stage of an MR pipeline to depend on another target being created

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: None
    • Labels:
      None

      Description

      There are a couple of problems (e.g., mapside-joins, total orderings, etc.) where we need to guarantee that one PCollection has been written to the FileSystem before another MapReduce pipeline that depends on that file is allowed to run. This doesn't fit cleanly into the current set of abstractions for Crunch, which is why we force pipelines to execute via the run command to guarantee that the files have been created before the second stage is run.

      We should add the ability for a particular PCollection to require that a SourceTarget instance has been created before it can be executed, and the planner should incorporate this information into the MR pipeline planning process.

        Attachments

        1. CRUNCH-128.patch
          20 kB
          Josh Wills
        2. CheckpointingIT.java
          1 kB
          Gabriel Reid
        3. CRUNCH-128v2.patch
          39 kB
          Josh Wills
        4. CRUNCH-128-with-op.patch
          55 kB
          Josh Wills
        5. CRUNCH-128-pdo-options.patch
          33 kB
          Josh Wills

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              jwills Josh Wills
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: