Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28947

Curator framework fails with NullPointerException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.15.1
    • None
    • Runtime / Coordination
    • None

    Description

      I'm getting the following error in JobManager and as a result JobManager exits.

      Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,491] ERROR Background exception was not retry-able or retry gave up (org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl:733)
      Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,493] ERROR Unhandled error in curator framework, error message: Background exception was not retry-able or retry gave up (org.apache.flink.runtime.util.ZooKeeperUtils:292)
      Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,494] ERROR Fatal error occurred while executing the TaskManager. Shutting it down... (org.apache.flink.runtime.taskexecutor.TaskManagerRunner:427)
      Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150) ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      Aug 12 06:37:30 server_name java[173]:         at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
      

      Steps

      • Create three servers
      • Run Flink JobManager and TaskManager on all of them (let's call these A, B and C). Use ZooKeeper HA Services.
      • Everything works as expected
      • Add a new server (D).
      • Shutdown server C
      • This error can be seen on both servers A and D. I didn't check B and C.

      This can be reproduced (apparently) with every execution.

      I'm using Flink 1.15.1. Actually I'm migrating from 1.13.X to 1.15.X. I'm not totally sure whether this ever happens on 1.13.X, but it seems to always happen on 1.15.1.

      I looked using debugger what's going on in the JobManager:

      main-EventThread[1] where
        [1] org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress (Compatibility.java:116)
        [2] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString (EnsembleTracker.java:185)
        [3] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData (EnsembleTracker.java:206)
        [4] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300 (EnsembleTracker.java:50)
        [5] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult (EnsembleTracker.java:150)
        [6] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback (CuratorFrameworkImpl.java:926)
        [7] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation (CuratorFrameworkImpl.java:683)
        [8] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation (WatcherRemovalFacade.java:152)
        [9] org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult (GetConfigBuilderImpl.java:222)
        [10] org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent (ClientCnxn.java:598)
        [11] org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run (ClientCnxn.java:510)
      main-EventThread[1] dump address
       address = {
          holder: instance of java.net.InetSocketAddress$InetSocketAddressHolder(id=8302)
          serialVersionUID: 5076001401234631237
          serialPersistentFields: instance of java.io.ObjectStreamField[3] (id=8303)
          UNSAFE: instance of jdk.internal.misc.Unsafe(id=8304)
          FIELDS_OFFSET: 12
          java.net.SocketAddress.serialVersionUID: 5215720748342549866
      }
      main-EventThread[1] dump address.holder
       address.holder = {
          hostname: "host_name_here"
          addr: null
          port: 2888
      }
      main-EventThread[1] print address.getAddress()
       address.getAddress() = null
      

      (The hostname has been changed).

      It can be seen that on line 116 of Compatibility.java (https://github.com/apache/curator/blob/d65669b64f003326c98843b32b997e3ffab1e442/curator-client/src/main/java/org/apache/curator/utils/Compatibility.java#L116) there's this

              return (address != null) ? address.getAddress().getHostAddress() : "unknown";
      

      Here address.getAddress() returns null causing the crash.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Mynttinen Juha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: