Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Event 4.2.24
-
None
-
None
Description
In an AEM instance where a lot of Sling Job processing was ongoing I traced all cases where a ResourceResolver was opened. I did this for a period of around 30 seconds.
In 154 times the stacktrace looked like this:
[...] at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0] at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueJobCache.loadJobs(QueueJobCache.java:224) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:180) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24] at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2] at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24] at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0] at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0] at org.apache.jackrabbit.commons.observation.ListenerTracker$1.onEvent(ListenerTracker.java:190) [org.apache.jackrabbit.jackrabbit-jcr-commons:2.20.0]
And in another 254 cases the stacktrace looked like this:
at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0] at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.JobHandler.persistJobProperties(JobHandler.java:217) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.JobHandler.startProcessing(JobHandler.java:74) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:190) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24] at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24] at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2] at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2] at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24] at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0] at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
In the case of loadJobs the ResourceResolver is opened to load all Jobs, although I am not sure if really all jobs are required at that point. To achieve this it traverses the repository to load all jobs matching a topic [1]. I wonder if this is traversal is necessary, because during that time this is done around 5 times per second. Is it possible to optimize that, so we don't have that much repository access.
Not sure if we can achieve something similar in the case of startJobs, maybe by holding a RR as part of the JobQueueCache and pass it to startProcessing. This would allow us to use a RR for other purposes as well.