Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.1
-
None
-
# java -version
openjdk version "1.8.0_77"
OpenJDK Runtime Environment (build 1.8.0_77-b03)
OpenJDK 64-Bit Server VM (build 25.77-b03, mixed mode)
Description
When I start standalone cluster with `bin/jobmanager.sh start cluster` command all works fine but then I am using the same command for HA cluster the JobManager raise error and stop:
log/flink--jobmanager-0-example-app-1.example.local.out
Exception in thread "main" scala.MatchError: ({blob.server.port=6130, state.backend.fs.checkpointdir=s3://s3.example.com/example_staging_flink/checkpoints, blob.storage.directory=/flink/data/blob_storage, jobmanager.heap.mb=1024, fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem, restart-strategy.fixed-delay.attempts=2, recovery.mode=zookeeper, jobmanager.web.port=8081, taskmanager.memory.preallocate=false, jobmanager.rpc.port=0, flink.base.dir.path=/flink/conf/.., recovery.zookeeper.storageDir=s3://s3.example.com/example_staging_flink/recovery, taskmanager.tmp.dirs=/flink/data/task_manager, restart-strategy.fixed-delay.delay=60s, taskmanager.data.port=6121, recovery.zookeeper.path.root=/example_staging/flink, parallelism.default=4, taskmanager.numberOfTaskSlots=4, recovery.zookeeper.quorum=zookeeper-1.example.local:2181,zookeeper-2.example.local:2181,zookeeper-3.example.local:2181, fs.hdfs.hadoopconf=/flink/conf, state.backend=filesystem, restart-strategy=none, recovery.jobmanager.port=6123, taskmanager.heap.mb=2048},CLUSTER,null,org.apache.flink.shaded.com.google.common.collect.Iterators$5@3bf7ca37) (of class scala.Tuple4) at org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1605) at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)
log/flink--jobmanager-0-example-app-1.example.local.log
2016-04-11 10:58:31,680 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, about=, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)]) 2016-04-11 10:58:31,696 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, about=, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)]) 2016-04-11 10:58:31,697 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, about=, type=DEFAULT, valueName=Time, value=[GetGroups]) 2016-04-11 10:58:31,699 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics 2016-04-11 10:58:31,951 DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:303) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:328) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:272) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:790) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:760) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:633) at org.apache.flink.runtime.util.EnvironmentInformation.getUserRunning(EnvironmentInformation.java:90) at org.apache.flink.runtime.util.EnvironmentInformation.logEnvironmentInfo(EnvironmentInformation.java:284) at org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1597) at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala) 2016-04-11 10:58:32,045 DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0 2016-04-11 10:58:32,052 DEBUG org.apache.hadoop.security.authentication.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty 2016-04-11 10:58:32,057 DEBUG org.apache.hadoop.security.Groups - Creating new Groups object 2016-04-11 10:58:32,059 DEBUG org.apache.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 2016-04-11 10:58:32,060 DEBUG org.apache.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 2016-04-11 10:58:32,060 DEBUG org.apache.hadoop.util.NativeCodeLoader - java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2016-04-11 10:58:32,061 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-04-11 10:58:32,061 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Falling back to shell based 2016-04-11 10:58:32,065 DEBUG org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 2016-04-11 10:58:32,189 DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 2016-04-11 10:58:32,198 DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login 2016-04-11 10:58:32,200 DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login commit 2016-04-11 10:58:32,203 DEBUG org.apache.hadoop.security.UserGroupInformation - using local user:UnixPrincipal: root 2016-04-11 10:58:32,204 DEBUG org.apache.hadoop.security.UserGroupInformation - Using user: "UnixPrincipal: root" with name root 2016-04-11 10:58:32,204 DEBUG org.apache.hadoop.security.UserGroupInformation - User entry: "root" 2016-04-11 10:58:32,205 DEBUG org.apache.hadoop.security.UserGroupInformation - UGI loginUser:root (auth:SIMPLE) 2016-04-11 10:58:32,206 INFO org.apache.flink.runtime.jobmanager.JobManager - -------------------------------------------------------------------------------- 2016-04-11 10:58:32,206 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager (Version: 1.0.1, Rev:4afa401, Date:31.03.2016 @ 13:40:33 UTC) 2016-04-11 10:58:32,206 INFO org.apache.flink.runtime.jobmanager.JobManager - Current user: root 2016-04-11 10:58:32,206 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.77-b03 2016-04-11 10:58:32,206 INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 981 MiBytes 2016-04-11 10:58:32,206 INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: (not set) 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - Hadoop version: 2.7.2 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms1024m 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx1024m 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog.file=/flink/log/flink--jobmanager-0-example-app-1.example.local.log 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/flink/conf/log4j.properties 2016-04-11 10:58:32,209 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/flink/conf/logback.xml 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments: 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - /flink/conf 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - cluster 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - Classpath: /flink/lib/flink-dist_2.11-1.0.1.jar:/flink/lib/flink-python_2.11-1.0.1.jar:/flink/lib/log4j-1.2.17.jar:/flink/lib/slf4j-log4j12-1.7.7.jar::: 2016-04-11 10:58:32,210 INFO org.apache.flink.runtime.jobmanager.JobManager - -------------------------------------------------------------------------------- 2016-04-11 10:58:32,211 INFO org.apache.flink.runtime.jobmanager.JobManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2016-04-11 10:58:32,328 INFO org.apache.flink.runtime.jobmanager.JobManager - Loading configuration from /flink/conf 2016-04-11 10:58:32,342 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.s3.impl, org.apache.hadoop.fs.s3.S3FileSystem 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.data.port, 6121 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 2048 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2016-04-11 10:58:32,344 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.tmp.dirs, /flink/data/task_manager 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6130 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.storage.directory, /flink/data/blob_storage 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 4 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, filesystem 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, s3://s3.example.com/example_staging_flink/checkpoints 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: restart-strategy, none 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: restart-strategy.fixed-delay.attempts, 2 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: restart-strategy.fixed-delay.delay, 60s 2016-04-11 10:58:32,345 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.mode, zookeeper 2016-04-11 10:58:32,346 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.quorum, zookeeper-1.example.local:2181,zookeeper-2.example.local:2181,zookeeper-3.example.local:2181 2016-04-11 10:58:32,346 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.root, /example_staging/flink 2016-04-11 10:58:32,346 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.storageDir, s3://s3.example.com/example_staging_flink/recovery 2016-04-11 10:58:32,346 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.jobmanager.port, 6123 2016-04-11 10:58:32,346 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /flink/conf 2016-04-11 10:58:32,350 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager with high-availability 2016-04-11 10:58:32,357 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager on 127.0.0.1:6123 with execution mode CLUSTER
But when I run the JobManager use command `bin/jobmanager.sh start cluster <hostname>` all works fine again.