Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10435

Client sporadically hangs after Ctrl + C

    XMLWordPrintableJSON

Details

    Description

      When submitting a YARN job cluster in attached mode, the client hangs indefinitely if Ctrl + C is pressed at the right time. One can recover from this by sending SIGKILL.

      Command to submit job

      HADOOP_CLASSPATH=`hadoop classpath` bin/flink run -m yarn-cluster examples/streaming/WordCount.jar
      

      Output/Stacktrace

      [hadoop@ip-172-31-45-22 flink-1.5.4]$ HADOOP_CLASSPATH=`hadoop classpath` bin/flink run -m yarn-cluster examples/streaming/WordCount.jar
      Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/home/hadoop/flink-1.5.4/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      2018-09-26 12:01:04,241 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at ip-172-31-45-22.eu-central-1.compute.internal/172.31.45.22:8032
      2018-09-26 12:01:04,386 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
      2018-09-26 12:01:04,386 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
      2018-09-26 12:01:04,402 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
      2018-09-26 12:01:04,598 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, numberTaskManagers=1, slotsPerTaskManager=1}
      2018-09-26 12:01:04,972 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - The configuration directory ('/home/hadoop/flink-1.5.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.
      2018-09-26 12:01:07,857 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Submitting application master application_1537944258063_0017
      2018-09-26 12:01:07,913 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1537944258063_0017
      2018-09-26 12:01:07,913 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Waiting for the cluster to be allocated
      2018-09-26 12:01:07,916 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deploying cluster, current state ACCEPTED
      ^C2018-09-26 12:01:08,851 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Cancelling deployment from Deployment Failure Hook
      2018-09-26 12:01:08,854 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Killing YARN application
      
      ------------------------------------------------------------
       The program finished with the following exception:
      
      org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:410)
      	at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:258)
      	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
      	at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
      	at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
      	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
      Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state KILLED during deployment.
      Diagnostics from YARN: Application application_1537944258063_0017 was killed by user hadoop at 172.31.45.22
      If log aggregation is enabled on your cluster, use this command to further investigate the issue:
      yarn logs -applicationId application_1537944258063_0017
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1059)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:532)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:403)
      	... 9 more
      2018-09-26 12:01:09,065 INFO  org.apache.hadoop.io.retry.RetryInvocationHandler             - Exception while invoking ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Retrying after sleeping for 30000ms.
      java.io.IOException: The client is stopped
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1345)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
      	at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:213)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
      	at com.sun.proxy.$Proxy9.forceKillApplication(Unknown Source)
      	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:439)
      	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:419)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1236)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:111)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1493)
      

      Expected behavior
      Client should shutdown the YARN cluster and exit.

      Attachments

        Issue Links

          Activity

            People

              wangyang0918 Yang Wang
              gjy Gary Yao
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m