Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.4.0, 0.5.0, 0.6.0, 0.7.0
-
None
-
None
-
None
Description
I have created a RANDOMINT function which generates random numbers between (0 and specified value), For example RANDOMINT(4) gives random numbers between 0 and 3 (inclusive)
$hadoop fs -cat rand.dat f g h i j k l m
The pig script is as follows:
register math.jar;
A = load 'rand.dat' using PigStorage() as (data);
B = foreach A {
r = math.RANDOMINT(4);
generate
data,
r as random,
((r == 3)?1:0) as quarter;
};
dump B;
The results are as follows:
{color:red} (f,0,0) (g,3,0) (h,0,0) (i,2,0) (j,3,0) (k,2,0) (l,0,1) (m,1,0) {color}
If you observe, (j,3,0) is created because r is used both in the foreach and generate clauses and generate different values.
Modifying the above script to below solves the issue. The M/R jobs from both scripts are the same. It is just a matter of convenience.
A = load 'rand.dat' using PigStorage() as (data);
B = foreach A generate
data,
math.RANDOMINT(4) as r;
C = foreach B generate
data,
r,
((r == 3)?1:0) as quarter;
dump C;
Is this issue related to PIG:747?
Viraj