Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17443

Flink's ZK in HA mode setup is unable to start up if any of the zk hosts are unreachable

    XMLWordPrintableJSON

Details

    Description

      We occasionally hit an issue where our Flink cluster will not startup if any of the zookeeper hosts passed in the "high-availability.zookeeper.quorum" config setting are unreachable. This seems to stem from us using an older zookeeper dependency version (3.4.10).
      Sample error we see is shown below.

      This error seems to stem from us being on an older zookeeper release (3.4.10). This has been fixed as part of: https://issues.apache.org/jira/browse/ZOOKEEPER-1576 in the 3.4.x branch (https://github.com/apache/zookeeper/commit/be1409cc9a14ac2e28693e0e02a0ba6d9713565e). 

      java.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not knownjava.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)  at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) at org.apache.flink.shaded.curator.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150) at org.apache.flink.shaded.curator.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) at org.apache.flink.shaded.curator.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) at org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.reset(ConnectionState.java:262) at org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.start(ConnectionState.java:109) at org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259) at org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:131) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:123) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pnarang Piyush Narang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: