Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
ManifoldCF 1.7.2, ManifoldCF 1.8, ManifoldCF 2.0
-
None
Description
Starting a job with 200K+ documents now takes many minutes. The reason seems to be document reprioritization, which has a significant bottleneck. A thread dump shows:
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:694) at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:762) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1435) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191) at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:750) at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performUpdate(DBInterfaceHSQLDB.java:296) at org.apache.manifoldcf.core.database.BaseTable.performUpdate(BaseTable.java:80) at org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValues(BinManager.java:158) at org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getIncrementBinValue(ReprioritizationTracker.java:328) at org.apache.manifoldcf.crawler.system.PriorityCalculator.getDocumentPriority(PriorityCalculator.java:145) at org.apache.manifoldcf.crawler.jobs.JobQueue.writeDocPriority(JobQueue.java:874) at org.apache.manifoldcf.crawler.jobs.JobManager.writeDocumentPriorities(JobManager.java:2142) at org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1121) at org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:1054) at org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread.java:141)