Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We upgraded to 0.9.5 ando ran into the following exception. The supervisors did go down:
1 caution in our upgrade is we started a new nimbus, without any supervisors attached. Then we deployed topologies (from CICD). Next we build new supervisors and the supervisors will start on startup. However, in between the network service is restarted (due to hostname changed during the build <- chef). Just wanna throw this out in case this makes a difference.
In other word, it could be that supervisors started, picked up work, then network restarted.
SEVERE: RuntimeException while executing runnable org.apache.storm.guava.util.concurrent.Futures$4@445058b with executor org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@691bc565 java.lang.RuntimeException: Failed to connect to Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700 at backtype.storm.messaging.netty.Client.connect(Client.java:308) at backtype.storm.messaging.netty.Client.access$1100(Client.java:78) at backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:297) at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:283) at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275) at org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181) at org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) at org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) at org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145) at org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91) at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384) at java.util.concurrent.FutureTask.set(FutureTask.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:274) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Giving up to connect to Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700 after 102 failed attempts at backtype.storm.messaging.netty.Client.connect(Client.java:303)