Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5320

TestCubeOperator#testRollupBasic is flaky on Spark 2.2

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: spark
    • Labels:
      None

      Description

      TestCubeOperator#testRollupBasic occasionally fails with

      org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias c
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1779)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
      	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1110)
      	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:512)
      	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
      	at org.apache.pig.PigServer.registerScript(PigServer.java:781)
      	at org.apache.pig.PigServer.registerScript(PigServer.java:858)
      	at org.apache.pig.PigServer.registerScript(PigServer.java:821)
      	at org.apache.pig.test.Util.registerMultiLineQuery(Util.java:972)
      	at org.apache.pig.test.TestCubeOperator.testRollupBasic(TestCubeOperator.java:124)
      Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get the rdds of this spark operator: 
      	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
      	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
      	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
      	at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
      	at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
      	at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:237)
      	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)
      	at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
      	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
      	at org.apache.pig.PigServer.execute(PigServer.java:1449)
      	at org.apache.pig.PigServer.access$500(PigServer.java:119)
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1774)
      Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
      	at org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
      	at org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
      	at org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
      	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
      	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
      

      I think the problem is that in JobStatisticCollector#waitForJobToEnd sparkListener.wait() is not inside a loop, like suggested in wait's javadoc:

           * As in the one argument version, interrupts and spurious wakeups are
           * possible, and this method should always be used in a loop:
      

      Thus due to a spurious wakeup, the wait might pass without a notify getting called.

        Attachments

        1. PIG-5320_1.patch
          2 kB
          Nándor Kollár
        2. PIG-5320_2.patch
          1 kB
          Nándor Kollár

          Activity

            People

            • Assignee:
              nkollar Nándor Kollár
              Reporter:
              nkollar Nándor Kollár
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: