Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1633

Using an alias withing Nested Foreach causes indeterminate behaviour

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.4.0, 0.5.0, 0.6.0, 0.7.0
    • None
    • None
    • None

    Description

      I have created a RANDOMINT function which generates random numbers between (0 and specified value), For example RANDOMINT(4) gives random numbers between 0 and 3 (inclusive)

      $hadoop fs -cat rand.dat
      f
      g
      h
      i
      j
      k
      l
      m
      

      The pig script is as follows:

      register math.jar;
      A = load 'rand.dat' using PigStorage() as (data);
      
      B = foreach A {
              r = math.RANDOMINT(4);
              generate
                      data,
                      r as random,
                      ((r == 3)?1:0) as quarter;
              };
      
      dump B;
      

      The results are as follows:

      {color:red} 
      (f,0,0)
      (g,3,0)
      (h,0,0)
      (i,2,0)
      (j,3,0)
      (k,2,0)
      (l,0,1)
      (m,1,0)
      {color} 
      

      If you observe, (j,3,0) is created because r is used both in the foreach and generate clauses and generate different values.

      Modifying the above script to below solves the issue. The M/R jobs from both scripts are the same. It is just a matter of convenience.

      A = load 'rand.dat' using PigStorage() as (data);
      
      B = foreach A generate
              data,
              math.RANDOMINT(4) as r;
      
      C = foreach B generate
              data,
              r,
              ((r == 3)?1:0) as quarter;
      
      dump C;
      

      Is this issue related to PIG:747?
      Viraj

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              viraj Viraj Bhat
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: