Pig
  1. Pig
  2. PIG-1065

In-determinate behaviour of Union when there are 2 non-matching schema's

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      I have a script which first does a union of these schemas and then does a ORDER BY of this result.

      f1 = LOAD '1.txt' as (key:chararray, v:chararray);
      f2 = LOAD '2.txt' as (key:chararray);
      u0 = UNION f1, f2;
      describe u0;
      dump u0;
      
      u1 = ORDER u0 BY $0;
      dump u1;
      

      When I run in Map Reduce mode I get the following result:
      $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
      ====================
      Schema for u0 unknown.
      ====================
      (1,2)
      (2,3)
      (1)
      (2)
      ====================
      org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias u1
      at org.apache.pig.PigServer.openIterator(PigServer.java:475)
      at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
      at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
      at org.apache.pig.Main.main(Main.java:397)
      ====================
      Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
      ====================

      When I run the same script in local mode I get a different result, as we know that local mode does not use any Hadoop Classes.
      $java -cp pig.jar org.apache.pig.Main -x local broken.pig
      ====================
      Schema for u0 unknown
      ====================
      (1,2)
      (1)
      (2,3)
      (2)
      ====================
      (1,2)
      (1)
      (2,3)
      (2)
      ====================

      Here are some questions
      1) Why do we allow union if the schemas do not match
      2) Should we not print an error message/warning so that the user knows that this is not allowed or he can get unexpected results?

      Viraj

        Issue Links

          Activity

          Viraj Bhat created issue -
          Pradeep Kamath made changes -
          Field Original Value New Value
          Link This issue blocks PIG-998 [ PIG-998 ]
          Olga Natkovich made changes -
          Fix Version/s 0.7.0 [ 12314397 ]
          Fix Version/s 0.6.0 [ 12314214 ]
          Olga Natkovich made changes -
          Fix Version/s 0.7.0 [ 12314397 ]
          Olga Natkovich made changes -
          Fix Version/s 0.9.0 [ 12315191 ]
          Alan Gates made changes -
          Assignee Alan Gates [ alangates ]
          Daniel Dai made changes -
          Link This issue is duplicated by PIG-999 [ PIG-999 ]
          Daniel Dai made changes -
          Link This issue is duplicated by PIG-999 [ PIG-999 ]
          Daniel Dai made changes -
          Link This issue duplicates PIG-1277 [ PIG-1277 ]
          Daniel Dai made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Olga Natkovich made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Alan Gates
              Reporter:
              Viraj Bhat
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development