Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-616

Casts to complex types do not work as expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.2.0
    • None
    • impl
    • None

    Description

      When we specify a (complex) type as a column in Pig, the TypeCastInserter inserts the appropriate cast for the (complex) type. However, in the implementation of POCast.java, when databyte arrays are converted to the (complex) types, we invoke the bytesToXXX method.

      For complex types, especially tuples and bags, we do not enforce the typing information specified by the user in the AS clause or with the explicit cast statement. The implementation solely relies on bytesToXXX to figure out the right type.

      An example of a query that fails is given below. Wrt the query, the data is a single column that is a bag of integers. The user specifies this bag to be a bag of chararray. This conversion is allowed in pig but the implementation does not perform the actual cast. Instead the bytesToBag is called on the stream. The resulting type is a bag of integers and not a bag of chararray. In the subsequent statement the user (correctly) assumes that the conversion has been performed but in reality it has not been done. At run time when a chararray based operation is performed we see a ClassCastException.

      The notion of a schema has is absent in the physical operators. The schema/fieldSchema in the logical layer has to be passed on to the physical layer. The schema can be used to perform additional operations like casting, etc.

      grunt> cat bag.data
      {(1)}
      
      grunt> a = load 'bag.data' as (b:{t:(c:chararray)});
      grunt> b = foreach a generate flatten(b);
      grunt> c = foreach b generate CONCAT('Hello ', $0);
      grunt> dump c;
      
      2009-01-12 10:44:44,417 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
      2009-01-12 10:45:09,439 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
      2009-01-12 10:45:09,440 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Job failed!
      2009-01-12 10:45:09,443 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error message from task (map) task_200812151518_9681_m_000000java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
              at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:37)
              at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:31)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:259)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:271)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:187)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:175)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
              at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
      ...
      
      2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias c
      
      2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator for alias c
              at org.apache.pig.PigServer.openIterator(PigServer.java:426)
              at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:271)
              at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
              at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
              at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
              at org.apache.pig.Main.main(Main.java:302)
      
      Caused by: java.io.IOException: Job terminated with anomalous status FAILED
              at org.apache.pig.PigServer.openIterator(PigServer.java:420)
              ... 5 more
      
      

      Attachments

        Issue Links

          Activity

            People

              gates Alan Gates
              sms Santhosh Muthur Srinivasan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: