Currently to purge old job, we use the thread job manager to check if some job need to be run and if the pool is empty, check the old job to purge.
I detected a problem when you have many job's server and two pools.
In my case, I have a first pool that run regular job and a second that receive huge asynchrone services (by persist call).
Server who manage the second pool have rarely the possibility to purge their jobs that, by the way, increase JobSandbox table
Server who manage the first pool, call often the purge process, but the table size generate a long query time to purge few element. Long query time, run often, this purge process consume 15% of the database load (with correct reindex)
After analyze, I propose this refactoring :
- When you call JobSandbox table for a purge, use a limit on the query with the max thread pool because when you have the database return, you purge only with this limit so this help the database to not scan the whole table. Other small improvement, do not sort the result (no functional gain). With this each query pass to 3.5s at 0.30s (on postgres).
- Purge by the pool thread is nice, each server can only purge their own job (filtered on run_by_instance_id), to help an overloaded server, I was rehabiliting service purgeOldJobs that you can run with specific parameter to make assistance.
- Last, all job services are historical on class org.apache.ofbiz.service.ServiceUtil, I moved them all in a new dedicate class org.apache.ofbiz.service.job.JobServices