Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
jtsk_2.1, River_2.1.1
-
None
Description
The method
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
has the following code that throws a RuntimeException (the code comment suggests that it is supposed to be impossible, but it's not).
ProxyReg reg = findReg(proxys[i]);
if(reg != null )
else
{ throw new RuntimeException("discard error"); }//endif
Our QA does failover testing with two servers, each with a Reggie, where we deliberately crash and reboot server 1 then server 2 every 30 minutes continuously. In one case, we hit that RuntimeException. I don't know why we got a null reg (that's a problem for another defect, maybe an undiagnosed race of two discards put on a task queue? Maybe related to RIVER-37?). But it caused a catastrophic chain of events because the RuntimeException is not caught anywhere up the stack. In our case, it killed the LookupLocatorDiscovery$Notifier thread.
java.lang.RuntimeException: discard error
at net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
at net.jini.discovery.LookupDiscoveryManager.access$500(92)
at net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
I propose three changes:
1) change the discarded() method above to simply warn instead of throwing
2) put a try/catch(Throwable) around the listener invocation in
LookupLocatorDiscovery$Notifier.run()
3) put a similar try/catch around listener invocation in LookupDiscoveryManager.notifyListener
The idea behind #2 and #3 is that misbehaving listeners should not be allowed to derail the discovery process.