Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2721

Wrong output generated while loading bags as input

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.2, 0.10.0, 0.11
    • Fix Version/s: 0.9.3, 0.11, 0.10.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      A = LOAD '/user/pvivek/sample' as (id:chararray,mybag:bag{tuple(bttype:chararray,cat:long)});
      B = foreach A generate id,FLATTEN(mybag) AS (bttype, cat);
      C = order B by id;
      dump C;
      

      The above code generates wrong results when executed with Pig 0.10 and Pig 0.9
      The below is the sample input;

      ...LKGaHqg--	{(aa,806743)}
      ..0MI1Y37w--	{(aa,498970)}
      ..0bnlpJrw--	{(aa,806740)}
      ..0p0IIhbA--	{(aa,498971),(se,498995)}
      ..1VkGqvXA--	{(aa,805219)}
      

      I think the Pig optimizers are causing this issue.From the logs I can see that the $1 is pruned for the relation A.

      [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for A: $1

      One workaround for this is to disable -t ColumnMapKeyPrune.

        Attachments

        1. pig-2721-trunk-notestyet.patch
          0.8 kB
          Koji Noguchi
        2. pig-2721-trunk-withtest_v1.patch
          5 kB
          Koji Noguchi

          Activity

            People

            • Assignee:
              knoguchi Koji Noguchi
              Reporter:
              vivekp Vivek Padmanabhan

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment