Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-539

unable to control parallelism of Map tasks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • impl
    • None
    • local execution + hadoop execution

    Description

      I put "PARALLEL 1" following every statement in my pig script, and it still executes maps with more than 1 parallel task. This is a major problem because for one of my operations I need to have a serialized (non-parallel) map.

      Probably the semantics of parallelism should be as follows:
      1. group pig operators into map/reduce stages
      2. for each stage, take the minimum of the "Parallel" directives given by the user for statements executed as part of that stage

      (We'll have to decide on a rule for statements that use the combiner, which execute partially on the map side and partially on the reduce side ...)

      Attachments

        Activity

          People

            Unassigned Unassigned
            olston Christopher Olston
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: