Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-1835

Spurious failure of YARN tests

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9
    • 0.9
    • Deployment / YARN
    • None

    Description

      THe failure was caused by detecting an exception in the log.

      Stack trace of the exception (extracted from the log) below

      21:18:29,555 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      21:18:29,806 INFO  org.apache.flink.yarn.ApplicationMaster$                      - YARN daemon runs as travis setting user to execute Flink ApplicationMaster/JobManager to travis
      21:18:29,808 INFO  org.apache.flink.yarn.ApplicationMaster$                      - --------------------------------------------------------------------------------
      21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Starting YARN ApplicationMaster/JobManager (Version: 0.9-SNAPSHOT, Rev:d2020b5, Date:06.04.2015 @ 18:00:21 UTC)
      21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Current user: travis
      21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.31-b07
      21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Maximum heap size: 393 MiBytes
      21:18:29,826 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  JAVA_HOME: /usr/lib/jvm/java-8-oracle
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  JVM Options:
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Xmx409M
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Dlog.file=/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-1_0/application_1428355034517_0004/container_1428355034517_0004_01_000001/jobmanager-main.log
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Dlogback.configurationFile=file:logback.xml
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Dlog4j.configuration=file:log4j.properties
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Program Arguments: (none)
      21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      - --------------------------------------------------------------------------------
      21:18:29,828 INFO  org.apache.flink.yarn.ApplicationMaster$                      - registered UNIX signal handlers for [TERM, HUP, INT]
      21:18:29,843 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Starting JobManager for YARN
      21:18:29,845 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Loading config from: /home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/container_1428355034517_0004_01_000001
      21:18:30,388 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
      21:18:30,450 INFO  Remoting                                                      - Starting remoting
      21:18:30,637 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@172.17.0.176:34023]
      21:18:30,651 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-e34b86da-094c-4a4e-aa02-7b0556e8af93
      21:18:30,655 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:33717 - max concurrent requests: 50 - max backlog: 1000
      21:18:30,670 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Starting Job Manger web frontend.
      21:18:30,673 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer         - Setting up web info server, using web-root directory jar:file:/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/filecache/12/flink-dist-0.9-SNAPSHOT.jar!/web-docs-infoserver.
      21:18:30,705 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager at akka://flink/user/jobmanager#395299512.
      21:18:31,184 INFO  org.eclipse.jetty.util.log                                    - jetty-0.9-SNAPSHOT
      21:18:31,269 INFO  org.eclipse.jetty.util.log                                    - Started SelectChannelConnector@0.0.0.0:49867
      21:18:31,270 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer         - Started web info server for JobManager on 0.0.0.0:49867
      21:18:31,270 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Generate configuration file for application master.
      21:18:31,283 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Starting YARN session on Job Manager.
      21:18:31,284 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Application Master properly initiated. Awaiting termination of actor system.
      21:18:31,287 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Start yarn session.
      21:18:31,489 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Requesting 1 TaskManagers. Tolerating 1 failed TaskManagers
      21:18:31,815 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8030
      21:18:31,914 INFO  org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - yarn.client.max-cached-nodemanagers-proxies : 0
      21:18:31,915 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Registering ApplicationMaster with tracking url http://testing-worker-linux-docker-2f4f6c00-3426-linux-13.prod.travis-ci.org:49867.
      21:18:32,255 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Requesting initial TaskManager container 0.
      21:18:32,283 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/container_1428355034517_0004_01_000001/flink-conf-modified.yaml to file:/tmp/junit3904564006360292351/junit1676152559016123175/.flink/application_1428355034517_0004/flink-conf-modified.yaml
      21:18:32,458 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Prepared local resource for modified yaml: resource { scheme: "file" port: -1 file: "/tmp/junit3904564006360292351/junit1676152559016123175/.flink/application_1428355034517_0004/flink-conf-modified.yaml" } size: 3393 timestamp: 1428355112000 type: FILE visibility: APPLICATION
      21:18:32,461 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Create container launch context.
      21:18:32,483 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Starting TM with command=$JAVA_HOME/bin/java -Xmx819m  -Dlog.file="<LOG_DIR>/taskmanager.log" -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.appMaster.YarnTaskManagerRunner --configDir . 1> <LOG_DIR>/taskmanager-stdout.log 2> <LOG_DIR>/taskmanager-stderr.log
      21:18:33,077 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user requested 1 containers, 0 running. 1 containers missing
      21:18:33,631 ERROR akka.actor.OneForOneStrategy                                  - Application attempt appattempt_1428355034517_0004_000001 doesn't exist in ApplicationMasterService cache.
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
      
      org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1428355034517_0004_000001 doesn't exist in ApplicationMasterService cache.
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
      
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:483)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      	at com.sun.proxy.$Proxy8.allocate(Unknown Source)
      	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
      	at org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:190)
      	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
      	at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
      	at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
      	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
      	at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
      	at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
      	at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:91)
      	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
      	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
      	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
      	at akka.dispatch.Mailbox.run(Mailbox.scala:221)
      	at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
      	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
      	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1428355034517_0004_000001 doesn't exist in ApplicationMasterService cache.
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
      
      	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
      	at com.sun.proxy.$Proxy7.allocate(Unknown Source)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
      	... 26 more
      21:18:33,646 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Stopping JobManager akka://flink/user/jobmanager#395299512.
      21:18:33,701 ERROR org.apache.flink.yarn.ApplicationMaster$                      - RECEIVED SIGNAL 15: SIGTERM
      21:19:52,986 INFO  org.apache.flink.yarn.YarnTestBase                            - Shutting down MiniYarn cluster
      

      Attachments

        Activity

          People

            rmetzger Robert Metzger
            sewen Stephan Ewen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: