Pig
  1. Pig
  2. PIG-723

Pig generates incorrect schema for generated bags after FOREACH.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.1.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Linux
      $pig --version
      Apache Pig version 0.1.0-dev (r750430)
      compiled Mar 07 2009, 09:20:13

      Description

      grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);
      grunt> rf_grouped = GROUP rf_src BY rhs;
      grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
      grunt> describe lhs_grouped;
      lhs_grouped: {rhs: chararray,lhs:

      {lhs: chararray,r: float}

      ,p: float,c: float}

      I think it should be:
      lhs_grouped: {rhs: chararray,lhs:

      {(lhs: chararray,r: float)}

      ,p: float,c: float}

      Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.

      This is what we want to UNION with:

      grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);
      grunt> aa = FOREACH asrc GENERATE rhs, (bag

      {tuple(chararray,float)}

      ) null as lhs, -10F as p, -10F as c;
      grunt> describe aa;
      aa: {rhs: chararray,lhs:

      {(chararray,float)}

      ,p: float,c: float}

      If there is something wrong with what I am trying to do, please let me know.

        Issue Links

          Activity

          Dhruv M created issue -
          Hide
          Santhosh Srinivasan added a comment -

          This is a duplicate of PIG-694.

          Show
          Santhosh Srinivasan added a comment - This is a duplicate of PIG-694 .
          Santhosh Srinivasan made changes -
          Field Original Value New Value
          Link This issue depends on PIG-694 [ PIG-694 ]
          Hide
          Olga Natkovich added a comment -

          Not sure why this issue was marked as critical

          Show
          Olga Natkovich added a comment - Not sure why this issue was marked as critical
          Olga Natkovich made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          Description
          grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);
          grunt> rf_grouped = GROUP rf_src BY rhs;
          grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
          grunt> describe lhs_grouped;
          lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}

          I think it should be:
          lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}

          Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.

          This is what we want to UNION with:

          grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);
          grunt> aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as lhs, -10F as p, -10F as c;
          grunt> describe aa;
          aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}

          If there is something wrong with what I am trying to do, please let me know.
          grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);
          grunt> rf_grouped = GROUP rf_src BY rhs;
          grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
          grunt> describe lhs_grouped;
          lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}

          I think it should be:
          lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}

          Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.

          This is what we want to UNION with:

          grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);
          grunt> aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as lhs, -10F as p, -10F as c;
          grunt> describe aa;
          aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}

          If there is something wrong with what I am trying to do, please let me know.
          Olga Natkovich made changes -
          Fix Version/s 0.9.0 [ 12315191 ]
          Alan Gates made changes -
          Assignee Alan Gates [ alangates ]
          Alan Gates made changes -
          Assignee Alan Gates [ alangates ] Daniel Dai [ daijy ]
          Hide
          Daniel Dai added a comment -

          It is a duplication of PIG-767.

          Show
          Daniel Dai added a comment - It is a duplication of PIG-767 .
          Daniel Dai made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Olga Natkovich made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue depends on PIG-694 [ PIG-694 ]
          Gavin made changes -
          Link This issue depends upon PIG-694 [ PIG-694 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          663d 11h 38m 1 Daniel Dai 10/Jan/11 22:38
          Resolved Resolved Closed Closed
          205d 1h 56m 1 Olga Natkovich 04/Aug/11 01:34

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Dhruv M
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development