Details
-
Improvement
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I have very complex Pig scripts which are often concatenations and iterations of a large number of map reduce tasks.
I've gotten into the habit of using the following construct in my code:
set job.name '$DIR/$DATE/summary.bz'; A = load ... ... store Z into '$DIR/$DATE/summary.bz' using PigStorage();
But it would be really useful if Pig script parsing automagically set these job.name values.
Ideally I'd like to have Pig just automagically construct job names for me so I can trace execution of multihour jobs in the HOD progress pages. Something like:
process-dates.pig A = LOAD /data/logs/daily/20090408 ... STORE Z into mysummary/20090408/summary.bz map-group-combiner-sort
Okay you say, I could construct this kind of job.name myself if this is what I want.
Well:
1) I'd really like to have a default constructed by Pig so I don't have to
2) Pig has information about what is happening that I don't have such as:
- The name of the script passed to Pig
- The glob expansion of the file pathname in the LOAD statement
- The execution plan of pig that would tell me what the map-group-combine-sort-reduce group looks like
- The name of intermediate STORE operations that are being performed