Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-723

Pig generates incorrect schema for generated bags after FOREACH.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.1.0
    • 0.9.0
    • None
    • None
    • Linux
      $pig --version
      Apache Pig version 0.1.0-dev (r750430)
      compiled Mar 07 2009, 09:20:13

    Description

      grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, rhs:chararray, r:float, p:float, c:float);
      grunt> rf_grouped = GROUP rf_src BY rhs;
      grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
      grunt> describe lhs_grouped;
      lhs_grouped: {rhs: chararray,lhs:

      {lhs: chararray,r: float}

      ,p: float,c: float}

      I think it should be:
      lhs_grouped: {rhs: chararray,lhs:

      {(lhs: chararray,r: float)}

      ,p: float,c: float}

      Because of this, we are not able to perform UNION on 2 sets because union on incompatible schemas is causing a complete loss of schema information, making further processing impossible.

      This is what we want to UNION with:

      grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int);
      grunt> aa = FOREACH asrc GENERATE rhs, (bag

      {tuple(chararray,float)}

      ) null as lhs, -10F as p, -10F as c;
      grunt> describe aa;
      aa: {rhs: chararray,lhs:

      {(chararray,float)}

      ,p: float,c: float}

      If there is something wrong with what I am trying to do, please let me know.

      Attachments

        Issue Links

          Activity

            People

              daijy Daniel Dai
              dhruvbird Dhruv M
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: