Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4518

SparkOperator should correspond to complete Spark job

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
      None

      Description

      SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every shuffle boundary (denoted by presence of POGlobalRearrange in the corresponding physical plan). This is unnecessary for Spark engine since it relies on Spark to do the shuffle (using groupBy(), reduceByKey() and CoGroupRDD) and does not need to explicitly identify "map" and "reduce" operations.

      It is also cleaner if a single SparkOperator represents a single complete Spark job.

        Attachments

        1. PIG-4518.1.patch
          23 kB
          Mohit Sabharwal
        2. PIG-4518.patch
          21 kB
          Mohit Sabharwal

          Issue Links

            Activity

              People

              • Assignee:
                mohitsabharwal Mohit Sabharwal
                Reporter:
                mohitsabharwal Mohit Sabharwal
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: