Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
scr-2.1.16
-
None
-
None
Description
When a declarative services component that implements EventHookListener is loaded by SCR, a deadlock occurs. This occurs since the SCR will attempt to get the service so it can deliver event notifications to it while it's already in the process of loading the service. Here is a breakdown of the deadlock stacktrace we ran into, I spent some time identifying the services that are being interacted with at the various stages in the thread stacktraces to come to this conclusion. After some thinking, it seems like the fix would be to check if an EventHookListener that needs to be loaded matches the service that is in progress of being loaded. I THINK that would prevent this deadlock from occurring. Obviously this problem can be worked around, but obviously is confusing when it occurs. Scott Lewis (who run the ECF project said it was intermittent for him), I ran into it with Equinox first, switched to Felix and then ran into it everytime I ran the project using an exported bndtools jar with the ECF. Scott initially logged this against Equinox and there was some discussion there. I'm attaching the issue to this one in case useful.
In the below breakdown and stacktraces, the TopologyManager class (from the ECF project) is being loaded by the SCR. That class implements the EventHookListener interface:
Main thread:
SCR tries to register the TopologyManager
Service event type 1 is fired
Equinox/Felix iterates the event listener hooks for which the TopologyManager is one, so it tries to get the TopologyManager service (to do the notification).
an attempt to retrieve the service count service to update the change count
ComponentRegistry updateChangeCount method is called
locks on monitor changeCountLock
Timer Thread 0:
ComponentRegistry locks the changeCountLock
SCR service, properties modified - service.changecount
fires event 2
tries to retrieve TopologyManager, because it's EventListenerHook to notify of the event
then waits on servicecount latch for static class in ServiceHolder
Stack trace from Scott, I didn't save the stack traces from the threads I was investigating, but I can easy get them if my above explanation isn't helpful enough to reproduce with.