Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
local execution + hadoop execution
Description
I put "PARALLEL 1" following every statement in my pig script, and it still executes maps with more than 1 parallel task. This is a major problem because for one of my operations I need to have a serialized (non-parallel) map.
Probably the semantics of parallelism should be as follows:
1. group pig operators into map/reduce stages
2. for each stage, take the minimum of the "Parallel" directives given by the user for statements executed as part of that stage
(We'll have to decide on a rule for statements that use the combiner, which execute partially on the map side and partially on the reduce side ...)