Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-509

Crunch with Spark doesn't name all outputs

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: Core
    • Labels:
      None

      Description

      Crunch currently does not "name" all outputs when running with a SparkPipeline. This becomes a problem as some Targets (based on CRUNCH-82) have coded in checked to ensure that the name must be populated. Specifically the implementation I'm running into issues with is the Kite DatasetTarget[2].

      Need to read up a bit on context to see if it is a Crunch/Kite issue or where it is easiest/correct to fix. Josh Wills or Tom White feedback would be welcome.

      [1] - https://github.com/apache/crunch/blob/3ab0b078c47f23b3ba893fdfb05fd723f663d02b/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L337
      [2] - https://github.com/kite-sdk/kite/blob/e080f0237e7383a16fff8547ad43387ccf55c473/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L178

        Attachments

        1. CRUNCH-509b.patch
          7 kB
          Josh Wills
        2. CRUNCH-509.patch
          3 kB
          Micah Whitacre

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              mkwhitacre Micah Whitacre
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: