Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8891

RestServerEndpoint can bind on local address only

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.5.0
    • 1.5.0
    • EC2 AMI debian-jessie-amd64-hvm-2017-01-15-1221-ebs (ami-5900cc36)
      Hadoop 2.8.3
      Flink commit 80020cb5866c8bac67a48f89aa481de7de262f83

    Description

      Description
      When deploying a Flink session on YARN, the DispatcherRestEndpoint may incorrectly bind on a local address. When this happens, the job submission and all REST API calls using a non-local address will fail. Setting rest.address: 0.0.0.0 in flink-conf.yaml has no effect because the value is overridden.

      znode leader contents

      [zk: localhost:2181(CONNECTED) 3] get /flink/application_1520439896153_0001/leader/rest_server_lock
      ??whttp://127.0.1.1:56299srjava.util.UUID????m?/J
                                                       leastSigBitsJ
                                                                    mostSigBitsxp??L???g?M??KFK
      cZxid = 0x10000000a
      ctime = Wed Mar 07 16:25:21 UTC 2018
      mZxid = 0x10000000a
      mtime = Wed Mar 07 16:25:21 UTC 2018
      pZxid = 0x10000000a
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0x5620147c1220000
      dataLength = 106
      numChildren = 0
      

      Contents of /etc/hosts

      127.0.1.1 ip-172-31-36-187.eu-central-1.compute.internal ip-172-31-36-187
      127.0.0.1 localhost
      
      # The following lines are desirable for IPv6 capable hosts
      ::1 ip6-localhost ip6-loopback
      fe00::0 ip6-localnet
      ff00::0 ip6-mcastprefix
      ff02::1 ip6-allnodes
      ff02::2 ip6-allrouters
      ff02::3 ip6-allhosts
      

      Note that without the first line, the problem does not appear.

      Error message & Stacktrace

      2018-03-07 16:25:24,267 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found application JobManager host name 'ip-172-31-44-106.eu-central-1.compute.internal' and port '56299' from supplied application id 'application_1520439896153_0001'
      Using the parallelism provided by the remote cluster (0). To use another parallelism, set it at the ./bin/flink client.
      Starting execution of program
      
      
      STDERR:
      
      ------------------------------------------------------------
       The program finished with the following exception:
      
      org.apache.flink.client.program.ProgramInvocationException: Could not submit job 6243b830a6cb1a0b6605a15a7d3d81d4.
      	at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:231)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:457)
      	at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:403)
      	at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:780)
      	at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:274)
      	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:209)
      	at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1019)
      	at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1095)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
      	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1095)
      Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
      	at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$4(RestClusterClient.java:327)
      	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
      	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
      	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
      	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
      	at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:196)
      	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
      	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
      	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
      	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: Connection refused: /127.0.1.1:56299
      	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
      	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
      	at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
      	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
      	... 16 more
      Caused by: java.net.ConnectException: Connection refused: /127.0.1.1:56299
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
      	at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
      	at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
      	... 7 more
      

      Attachments

        Issue Links

          Activity

            People

              gjy Gary Yao
              gjy Gary Yao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: