Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1831

Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards.

      For example consider the below script :

      DEFINE MySTREAMUDF `test.sh`;
      A = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 );
      B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int);
      --STORE B into 'output.B';
      C = JOIN B by wId LEFT OUTER, A by myId;
      D = FOREACH C GENERATE B::wId,B::num,data4 ;
      D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int);
      --STORE D into 'output.D';
      E = foreach B GENERATE wId,num;
      F = DISTINCT E;
      G = GROUP F ALL;
      H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount;
      I = CROSS D,H;
      STORE I into 'output.I';

      test.sh
      ---------
      #/bin/bash
      cut -f1,3

      And input is
      abcd label1 11 feature1
      acbd label2 22 feature2
      adbc label3 33 feature3

      Here if I store relation B and D then everytime i get the result :
      acbd 3
      abcd 3
      adbc 3

      But if i dont store relations B and D then I get an empty output. Here again I have observed that this behaviour is random ie sometimes like 1out of 5 runs there will be output.

        Attachments

        1. PIG-1831-0.patch
          0.8 kB
          Jianyong Dai
        2. PIG-1831-1.patch
          23 kB
          Jianyong Dai
        3. PIG-1831-2.patch
          24 kB
          Jianyong Dai
        4. PIG-1831-3.patch
          24 kB
          Jianyong Dai

          Activity

            People

            • Assignee:
              daijy Jianyong Dai
              Reporter:
              vivekp Vivek Padmanabhan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: