Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I was doing some lock analysis and found that we have quite a bit of contention on ZkStateReader$LazyCollectionRef.get(boolean) during heavy collection creation. I ran a sample workload creating as many collections as I could in 10 minutes, and this method was blocked for about 1:30 of that, which is a pretty significant portion.
A few representative stack traces:
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String, boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String) org.apache.solr.cloud.ZkController.checkIfCoreNodeNameAlreadyExists(CoreDescriptor) org.apache.solr.core.CoreContainer.create(String, Path, Map, boolean)
And another:
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String, boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String) org.apache.solr.common.cloud.ZkStateReader.getCollection(String) org.apache.solr.cloud.ZkController.publish(CoreDescriptor, Replica$State, boolean, boolean) org.apache.solr.cloud.ZkController.preRegister(CoreDescriptor, boolean) org.apache.solr.core.CoreContainer.createFromDescriptor(CoreDescriptor, boolean, boolean) org.apache.solr.core.CoreContainer.create(String, Path, Map, boolean)
And one more:
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String, boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String) org.apache.solr.common.cloud.ZkStateReader.registerDocCollectionWatcher(String, DocCollectionWatcher) org.apache.solr.common.cloud.ZkStateReader.waitForState(String, long, TimeUnit, Predicate) org.apache.solr.cloud.ZkController.checkStateInZk(CoreDescriptor) org.apache.solr.cloud.ZkController.preRegister(CoreDescriptor, boolean) org.apache.solr.core.CoreContainer.createFromDescriptor(CoreDescriptor, boolean, boolean) org.apache.solr.core.CoreContainer.create(String, Path, Map, boolean)
It looks like part of the problem is that we never allow ourselves to use the cache so each one happens to be a full fetch out to ZK. We have the optimizations there to compare the stat and the version, but it's still relatively heavyweight it appears.
cc: noble.paul, you might find this interesting.
Attachments
1.
|
Reduce duplicative core creation work | Closed | Mike Drob |
|