Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4675

Operators with multiple predecessors fail under multiquery optimization

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • spark-branch
    • spark
    • None

    Description

      We are testing the spark branch pig recently with mapr3 and spark 1.5. It turns out if we use more than 1 store command in the pig script will have exception from the second store command.

      SSN = load '/test/ssn.txt' using PigStorage() as (ssn:long);
      SSN_NAME = load '/test/name.txt' using PigStorage() as (ssn:long, name:chararray);
      X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn USING 'replicated';
      R1 = limit SSN_NAME 10;
      store R1 into '/tmp/test1_r1';
      store X into '/tmp/test1_x';

      Exception Details:

      15/09/11 13:37:00 INFO storage.MemoryStore: ensureFreeSpace(114448) called with curMem=359237, maxMem=503379394
      15/09/11 13:37:00 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 111.8 KB, free 479.6 MB)
      15/09/11 13:37:00 INFO storage.MemoryStore: ensureFreeSpace(32569) called with curMem=473685, maxMem=503379394
      15/09/11 13:37:00 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 31.8 KB, free 479.6 MB)
      15/09/11 13:37:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.51.2.82:55960 (size: 31.8 KB, free: 479.9 MB)
      15/09/11 13:37:00 INFO spark.SparkContext: Created broadcast 2 from newAPIHadoopRDD at LoadConverter.java:88
      15/09/11 13:37:00 WARN util.ClosureCleaner: Expected a closure; got org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter$ToTupleFunction
      15/09/11 13:37:00 INFO spark.SparkLauncher: Converting operator POForEach (Name: SSN: New For Each(false)[bag] - scope-17 Operator Key: scope-17)
      15/09/11 13:37:00 INFO spark.SparkLauncher: Converting operator POFRJoin (Name: X: FRJoin[tuple] - scope-22 Operator Key: scope-22)
      15/09/11 13:37:00 ERROR spark.SparkLauncher: throw exception in sparkOperToRDD:
      java.lang.RuntimeException: Should have greater than1 predecessors for class org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin. Got : 1
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkUtil.assertPredecessorSizeGreaterThan(SparkUtil.java:93)
      at org.apache.pig.backend.hadoop.executionengine.spark.converter.FRJoinConverter.convert(FRJoinConverter.java:55)
      at org.apache.pig.backend.hadoop.executionengine.spark.converter.FRJoinConverter.convert(FRJoinConverter.java:46)
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:633)
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:600)
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:621)
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkOperToRDD(SparkLauncher.java:552)
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkPlanToRDD(SparkLauncher.java:501)
      at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:204)
      at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:301)
      at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
      at org.apache.pig.PigServer.execute(PigServer.java:1364)
      at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
      at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
      at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
      at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
      at org.apache.pig.Main.run(Main.java:624)
      at org.apache.pig.Main.main(Main.java:170)

      Attachments

        1. test.pig
          0.4 kB
          Peter Lin
        2. ssn.txt
          0.0 kB
          Peter Lin
        3. PIG-4675_3.patch
          14 kB
          liyunzhang
        4. PIG-4675_2.patch
          14 kB
          liyunzhang
        5. PIG-4675_1.patch
          13 kB
          liyunzhang
        6. name.txt
          0.0 kB
          Peter Lin

        Issue Links

          Activity

            People

              kellyzly liyunzhang
              linyunfeng99512 Peter Lin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: