Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0, 1.2.2
    • Component/s: Local Runtime
    • Labels:
      None

      Description

      PageRank (see PR from FLINK-4896) results in the following error. Can be reproduced by changing AsmTestBase:63 to env = executionEnvironment.createLocalEnvironment(); then running PageRankTest (fails for Simple and RMat graph tests, succeeds for Complete graph test).

      org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed.
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
      	at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
      	at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
      	at org.apache.flink.graph.asm.dataset.AbstractDataSetAnalytic.execute(AbstractDataSetAnalytic.java:55)
      	at org.apache.flink.graph.drivers.PageRank.print(PageRank.java:113)
      	at org.apache.flink.graph.Runner.main(Runner.java:257)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
      	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
      	at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:834)
      	at org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)
      	at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1076)
      	at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1123)
      	at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
      	at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
      	at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1120)
      Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
      	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:905)
      	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:848)
      	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:848)
      	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
      	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
      	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
      	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      Caused by: java.lang.RuntimeException: An error occurred creating the temp table.
      	at org.apache.flink.runtime.operators.TempBarrier.getIterator(TempBarrier.java:98)
      	at org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1090)
      	at org.apache.flink.runtime.operators.BatchTask.resetAllInputs(BatchTask.java:895)
      	at org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:136)
      	at org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107)
      	at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.Exception: The dam has been closed.
      	at org.apache.flink.runtime.operators.TempBarrier.close(TempBarrier.java:115)
      	at org.apache.flink.runtime.operators.BatchTask.resetAllInputs(BatchTask.java:886)
      	... 5 more
      

        Issue Links

          Activity

          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed in

          • 1.3.0 via c9746846b357d8ce538ff872cea60c52b1904b43
          • 1.2.2 via 664c49df7e76b61fbf84cf1b416dff5d0cfdd2ac
          Show
          StephanEwen Stephan Ewen added a comment - Fixed in 1.3.0 via c9746846b357d8ce538ff872cea60c52b1904b43 1.2.2 via 664c49df7e76b61fbf84cf1b416dff5d0cfdd2ac
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3747

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3747
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3747

          Merging this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3747 Merging this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3747

          +1 to this fix

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3747 +1 to this fix
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user greghogan opened a pull request:

          https://github.com/apache/flink/pull/3747

          FLINK-5623 [runtime] Fix TempBarrier dam has been closed

          Properly reset the "pipeline breaker" upon closing.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/greghogan/flink 5623_tempbarrier_dam_has_been_closed

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3747.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3747


          commit 4ff4c1238647a5ff4f80dbc012ce5e7b39391363
          Author: Greg Hogan <code@greghogan.com>
          Date: 2017-04-20T12:46:01Z

          FLINK-5623 [runtime] Fix TempBarrier dam has been closed

          Properly reset the "pipeline breaker" upon closing.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3747 FLINK-5623 [runtime] Fix TempBarrier dam has been closed Properly reset the "pipeline breaker" upon closing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 5623_tempbarrier_dam_has_been_closed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3747.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3747 commit 4ff4c1238647a5ff4f80dbc012ce5e7b39391363 Author: Greg Hogan <code@greghogan.com> Date: 2017-04-20T12:46:01Z FLINK-5623 [runtime] Fix TempBarrier dam has been closed Properly reset the "pipeline breaker" upon closing.
          Hide
          StephanEwen Stephan Ewen added a comment -

          I think the issue is that the "pipeline breaker" is not properly reset. The logic closes it, but then on re-creation re-references it.

          The fix could be to add to BatchTask, line 887 a

          ...
          this.tempBarriers[i].close();
          this.tempBarriers[i] = null;   // <<= this is the new line
          
          Show
          StephanEwen Stephan Ewen added a comment - I think the issue is that the "pipeline breaker" is not properly reset. The logic closes it, but then on re-creation re-references it. The fix could be to add to BatchTask , line 887 a ... this .tempBarriers[i].close(); this .tempBarriers[i] = null ; // <<= this is the new line
          Hide
          greghogan Greg Hogan added a comment -

          Fabian Hueske or Stephan Ewen, would you have a chance to look at this bug? A quick way to trigger this error is to rebase to master then remove the superfluous map from org.apache.flink.graph.library.link_analysis.PageRank:181 and execute org.apache.flink.graph.drivers.PageRankITCase. I can pursue any suggestions you might have.

          Show
          greghogan Greg Hogan added a comment - Fabian Hueske or Stephan Ewen , would you have a chance to look at this bug? A quick way to trigger this error is to rebase to master then remove the superfluous map from org.apache.flink.graph.library.link_analysis.PageRank:181 and execute org.apache.flink.graph.drivers.PageRankITCase . I can pursue any suggestions you might have.
          Hide
          greghogan Greg Hogan added a comment -

          The error does not occur when the graph is simplified before running the algorithm.

          Show
          greghogan Greg Hogan added a comment - The error does not occur when the graph is simplified before running the algorithm.

            People

            • Assignee:
              greghogan Greg Hogan
              Reporter:
              greghogan Greg Hogan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development