Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4518

SparkOperator should correspond to complete Spark job

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • spark-branch
    • spark
    • None

    Description

      SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every shuffle boundary (denoted by presence of POGlobalRearrange in the corresponding physical plan). This is unnecessary for Spark engine since it relies on Spark to do the shuffle (using groupBy(), reduceByKey() and CoGroupRDD) and does not need to explicitly identify "map" and "reduce" operations.

      It is also cleaner if a single SparkOperator represents a single complete Spark job.

      Attachments

        1. PIG-4518.1.patch
          23 kB
          Mohit Sabharwal
        2. PIG-4518.patch
          21 kB
          Mohit Sabharwal

        Issue Links

          Activity

            People

              mohitsabharwal Mohit Sabharwal
              mohitsabharwal Mohit Sabharwal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: