Solr
  1. Solr
  2. SOLR-8280

Schemaless features don't work reliably with SolrCoreAware sim factory -- example: SchemaSimilarityFactory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      SOLR-8271 uncovered problems with using SchemaSimilarityFactory + schemaless features. While the broader problems of SolrCoreAware objects inited after the SolrCore was live have been spun off into SOLR-8311 this issue focuses on fixing & testing the core problem of ensuring SchemaSimilarityFactory + schemaless features function together.

      original bug report

      Something about the code path(s) involved in TestCloudSchemaless & ChangedSchemaMergeTest don't play nicely with a SimilarityFactory that is SolrCoreAware – notably: SchemaSimilarityFactory.

      I discovered this while trying to implement SOLR-8271, but it can be reproduced trivially by modifying the schema-add-schema-fields-update-processor.xml file used by TestCloudSchemaless (and hardcoded in java schema used by ChangedSchemaMergeTest) to refer to SchemaSimilarityFactory explicitly. Other cloud tests (such as CollectionReloadTest) or cloud+schemaless (ex: TestCloudManagedSchema) tests don't seem to demonstrate the same problem.

      1. SOLR-8280__broken__resource_loader_experiment.patch
        10 kB
        Hoss Man
      2. SOLR-8280.patch
        15 kB
        Hoss Man
      3. SOLR-8280.patch
        7 kB
        Hoss Man
      4. SOLR-8280.patch
        2 kB
        Hoss Man
      5. SOLR-8280.patch
        1 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          With the attached patch, it's trivial to trigger a suite failure in TestCloudSchemaless, although lots of other cloud related tests (using schema.xml via AbstractFullDistribZkTestBase) function just fine.

          (NOTE: there are in fact a handful of other test failures with this patch, but they exist due to specific Sim related assumptions in those tests vs the general change of schema.xml to using SchemaSimilarityFactory ... i included the change to schema.xml in this patch purely as an easy way to demonstrate that most cloud related tests – even ones using schemaless and/or involving core reloads – don't ffreak out the same way as TestCloudSchemaless)


          Quick walk through of the TestCloudSchemaless failure...

          ant test  -Dtestcase=TestCloudSchemaless -Dtests.seed=8F1B744FBE000231 -Dtests.slow=true
          

          The test itself runs fine, and reaches the SolrTestCaseJ4 ###Ending test" stage at which point shutdown begins – and eventually stalls in the dir factory waiting for directories to close...

             [junit4]   2> 52621 INFO  (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.SolrCore [collection1] Closing main searcher on request.
             [junit4]   2> 52621 INFO  (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.CachingDirectoryFactory Closing NRTCachingDirectoryFactory - 2 directories currently being tracked
             [junit4]   2> 64733 ERROR (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.CachingDirectoryFactory Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-1-001/cores/collection1/data/index;done=false>>
             [junit4]   2> 64733 ERROR (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.CachingDirectoryFactory Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-1-001/cores/collection1/data/index;done=false>>
             [junit4]   2> 	at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:187)
             [junit4]   2> 	at org.apache.solr.core.SolrCore.close(SolrCore.java:1268)
             [junit4]   2> 	at org.apache.solr.core.SolrCores.close(SolrCores.java:124)
             [junit4]   2> 	at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:570)
             [junit4]   2> 	at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:172)
             [junit4]   2> 	at org.apache.solr.cloud.ChaosMonkey.stopJettySolrRunner(ChaosMonkey.java:186)
             [junit4]   2> 	at org.apache.solr.cloud.ChaosMonkey.stop(ChaosMonkey.java:518)
             [junit4]   2> 	at org.apache.solr.cloud.AbstractFullDistribZkTestBase.destroyServers(AbstractFullDistribZkTestBase.java:1514)
             [junit4]   2> 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:964)
             [junit4]   2> 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
             [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:875)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:777)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:822)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
             [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
             [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> 
             [junit4]   2> 64733 ERROR (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.SolrCore java.lang.AssertionError: 1
             [junit4]   2> 	at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:201)
             [junit4]   2> 	at org.apache.solr.core.SolrCore.close(SolrCore.java:1268)
             [junit4]   2> 	at org.apache.solr.core.SolrCores.close(SolrCores.java:124)
             [junit4]   2> 	at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:570)
             [junit4]   2> 	at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:172)
             [junit4]   2> 	at org.apache.solr.cloud.ChaosMonkey.stopJettySolrRunner(ChaosMonkey.java:186)
             [junit4]   2> 	at org.apache.solr.cloud.ChaosMonkey.stop(ChaosMonkey.java:518)
             [junit4]   2> 	at org.apache.solr.cloud.AbstractFullDistribZkTestBase.destroyServers(AbstractFullDistribZkTestBase.java:1514)
             [junit4]   2> 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:964)
             [junit4]   2> 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
             [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:875)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:777)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:822)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
             [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
             [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
             [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
             [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> 
          

          ...at that point, things go from bad to worse, as RecoveryThreads kick in (why is the collection attempting to recover it it was deliberaly being shutdown???) but of course that doesn't turn out well because the CoreContainer's are all in shutdown mode...

             [junit4]   2> 69727 INFO  (qtp1007998027-153) [n:127.0.0.1:44195_    ] o.a.s.h.a.CoreAdminHandler Going to wait for coreNodeName: core_node8, state: recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
             [junit4]   2> 69727 INFO  (zkCallback-44-thread-2-processing-n:127.0.0.1:42678_) [n:127.0.0.1:42678_    ] o.a.s.c.c.ZkStateReader Updating data for collection1 from 29 to 30 
             [junit4]   2> 69728 INFO  (zkCallback-39-thread-2-processing-n:127.0.0.1:52349_) [n:127.0.0.1:52349_    ] o.a.s.c.c.ZkStateReader Updating data for collection1 from 29 to 30 
             [junit4]   2> 69728 INFO  (qtp1007998027-153) [n:127.0.0.1:44195_    ] o.a.s.h.a.CoreAdminHandler Will wait a max of 183 seconds to see collection1 (shard1 of collection1) have state: recovering
             [junit4]   2> 69728 INFO  (qtp1007998027-153) [n:127.0.0.1:44195_    ] o.a.s.h.a.CoreAdminHandler In WaitForState(recovering): collection=collection1, shard=shard1, thisCore=collection1, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=recovering, localState=active, nodeName=127.0.0.1:42678_, coreNodeName=core_node8, onlyIfActiveCheckResult=false, nodeProps: core_node8:{"core":"collection1","base_url":"http://127.0.0.1:42678","node_name":"127.0.0.1:42678_","state":"recovering"}
             [junit4]   2> 69729 INFO  (qtp1007998027-153) [n:127.0.0.1:44195_    ] o.a.s.h.a.CoreAdminHandler Waited coreNodeName: core_node8, state: recovering, checkLive: true, onlyIfLeader: true for: 0 seconds.
             [junit4]   2> 69729 INFO  (qtp1007998027-153) [n:127.0.0.1:44195_    ] o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/cores params={nodeName=127.0.0.1:42678_&onlyIfLeaderActive=true&core=collection1&coreNodeName=core_node8&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2} status=0 QTime=1 
             [junit4] HEARTBEAT J0 PID(17711@tray): 2015-11-11T16:22:31, stalled for 70.2s at: TestCloudSchemaless.test
             [junit4]   2> 72390 INFO  (RecoveryThread-collection1) [n:127.0.0.1:44717_ c:collection1 s:shard3 r:core_node6 x:collection1] o.a.s.c.RecoveryStrategy Starting Replication Recovery.
             [junit4]   2> 72390 INFO  (RecoveryThread-collection1) [n:127.0.0.1:44717_ c:collection1 s:shard3 r:core_node6 x:collection1] o.a.s.c.RecoveryStrategy Attempting to replicate from http://127.0.0.1:54475/collection1/.
             [junit4]   2> 72391 ERROR (qtp1661135858-94) [n:127.0.0.1:54475_    ] o.a.s.s.SolrDispatchFilter Error processing the request. CoreContainer is either not initialized or shutting down.
             [junit4]   2> 72392 WARN  (qtp1661135858-94) [n:127.0.0.1:54475_    ] o.e.j.s.ServletHandler /collection1/update
             [junit4]   2> org.apache.solr.common.SolrException: Error processing the request. CoreContainer is either not initialized or shutting down.
             [junit4]   2> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:187)
             [junit4]   2> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
             [junit4]   2> 	at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
             [junit4]   2> 	at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83)
             [junit4]   2> 	at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:300)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
             [junit4]   2> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
             [junit4]   2> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
             [junit4]   2> 	at org.eclipse.jetty.server.Server.handle(Server.java:499)
             [junit4]   2> 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
             [junit4]   2> 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
             [junit4]   2> 	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
             [junit4]   2> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
             [junit4]   2> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
             [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
          

          ...and then things really get weird, as the RecoveryThread starts trying to do replication recovery, and then nodes start complaining that they can't open a new searcher because SolrCoreAware.inform() was never called on the SimilarityFactory (an assert that exists inside SchemaSimilarityFactory.getSimilarity()) ...

             [junit4]   2> 74329 INFO  (RecoveryThread-collection1) [n:127.0.0.1:52349_ c:collection1 s:shard4 r:core_node7 x:collection1] o.a.s.c.RecoveryStrategy Attempting to replicate from http://127.0.0.1:54695/collection1/.
             [junit4]   2> 74331 INFO  (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.u.UpdateHandler start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
             [junit4]   2> 74359 INFO  (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.c.SolrCore SolrDeletionPolicy.onCommit: commits: num=2
             [junit4]   2> 	commit{dir=NRTCachingDirectory(MMapDirectory@/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-3-001/cores/collection1/data/index lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@5720ab9c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
             [junit4]   2> 	commit{dir=NRTCachingDirectory(MMapDirectory@/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-3-001/cores/collection1/data/index lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@5720ab9c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_5,generation=5}
             [junit4]   2> 74359 INFO  (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.c.SolrCore newest commit generation = 5
             [junit4]   2> 74361 INFO  (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.s.SolrIndexSearcher Opening Searcher@5428727f[collection1] realtime
             [junit4]   2> 74361 INFO  (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.u.p.LogUpdateProcessor [collection1] webapp= path=/update params={waitSearcher=true&openSearcher=false&commit=true&softCommit=false&commit_end_point=true&wt=javabin&version=2} {} 0 30
             [junit4]   2> 74362 ERROR (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.s.SolrDispatchFilter null:java.lang.RuntimeException: java.lang.AssertionError: inform must be called first
             [junit4]   2> 	at org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:616)
             [junit4]   2> 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:475)
             [junit4]   2> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
             [junit4]   2> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
             [junit4]   2> 	at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
             [junit4]   2> 	at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83)
             [junit4]   2> 	at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:300)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
             [junit4]   2> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
             [junit4]   2> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
             [junit4]   2> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
             [junit4]   2> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
             [junit4]   2> 	at org.eclipse.jetty.server.Server.handle(Server.java:499)
             [junit4]   2> 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
             [junit4]   2> 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
             [junit4]   2> 	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
             [junit4]   2> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
             [junit4]   2> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
             [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> Caused by: java.lang.AssertionError: inform must be called first
             [junit4]   2> 	at org.apache.solr.search.similarities.SchemaSimilarityFactory.getSimilarity(SchemaSimilarityFactory.java:86)
             [junit4]   2> 	at org.apache.solr.schema.IndexSchema.getSimilarity(IndexSchema.java:269)
             [junit4]   2> 	at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:255)
             [junit4]   2> 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1628)
             [junit4]   2> 	at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:635)
             [junit4]   2> 	at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
             [junit4]   2> 	at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1616)
             [junit4]   2> 	at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1593)
             [junit4]   2> 	at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
             [junit4]   2> 	at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
             [junit4]   2> 	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
             [junit4]   2> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:151)
             [junit4]   2> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2079)
             [junit4]   2> 	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:667)
             [junit4]   2> 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
             [junit4]   2> 	... 23 more
          

          ...and then eventually the suite times out.


          There are lots of questions here, notably:

          1. what makes this tests special that it fails so spectaculaly like this?
            • in particular why does this SimilarityFactory change cause the directory ref counting to fail?
          2. why is recovery happening after shutdown?
          3. regardless of how/why the other things in the test go so bizare in the first place, what code path is leading to the SchemaSimilarityFactory that hasn't had .inform(SolrCore) called on it?
          Show
          Hoss Man added a comment - With the attached patch, it's trivial to trigger a suite failure in TestCloudSchemaless, although lots of other cloud related tests (using schema.xml via AbstractFullDistribZkTestBase) function just fine. (NOTE: there are in fact a handful of other test failures with this patch, but they exist due to specific Sim related assumptions in those tests vs the general change of schema.xml to using SchemaSimilarityFactory ... i included the change to schema.xml in this patch purely as an easy way to demonstrate that most cloud related tests – even ones using schemaless and/or involving core reloads – don't ffreak out the same way as TestCloudSchemaless) Quick walk through of the TestCloudSchemaless failure... ant test -Dtestcase=TestCloudSchemaless -Dtests.seed=8F1B744FBE000231 -Dtests.slow=true The test itself runs fine, and reaches the SolrTestCaseJ4 ###Ending test" stage at which point shutdown begins – and eventually stalls in the dir factory waiting for directories to close... [junit4] 2> 52621 INFO (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.SolrCore [collection1] Closing main searcher on request. [junit4] 2> 52621 INFO (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.CachingDirectoryFactory Closing NRTCachingDirectoryFactory - 2 directories currently being tracked [junit4] 2> 64733 ERROR (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.CachingDirectoryFactory Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-1-001/cores/collection1/data/index;done=false>> [junit4] 2> 64733 ERROR (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.CachingDirectoryFactory Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-1-001/cores/collection1/data/index;done=false>> [junit4] 2> at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:187) [junit4] 2> at org.apache.solr.core.SolrCore.close(SolrCore.java:1268) [junit4] 2> at org.apache.solr.core.SolrCores.close(SolrCores.java:124) [junit4] 2> at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:570) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:172) [junit4] 2> at org.apache.solr.cloud.ChaosMonkey.stopJettySolrRunner(ChaosMonkey.java:186) [junit4] 2> at org.apache.solr.cloud.ChaosMonkey.stop(ChaosMonkey.java:518) [junit4] 2> at org.apache.solr.cloud.AbstractFullDistribZkTestBase.destroyServers(AbstractFullDistribZkTestBase.java:1514) [junit4] 2> at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:964) [junit4] 2> at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) [junit4] 2> at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:875) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:777) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:822) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> [junit4] 2> 64733 ERROR (TEST-TestCloudSchemaless.test-seed#[E3CE46BC04E096EF]) [n:127.0.0.1:35554_ c:collection1 s:shard2 r:core_node1 x:collection1] o.a.s.c.SolrCore java.lang.AssertionError: 1 [junit4] 2> at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:201) [junit4] 2> at org.apache.solr.core.SolrCore.close(SolrCore.java:1268) [junit4] 2> at org.apache.solr.core.SolrCores.close(SolrCores.java:124) [junit4] 2> at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:570) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:172) [junit4] 2> at org.apache.solr.cloud.ChaosMonkey.stopJettySolrRunner(ChaosMonkey.java:186) [junit4] 2> at org.apache.solr.cloud.ChaosMonkey.stop(ChaosMonkey.java:518) [junit4] 2> at org.apache.solr.cloud.AbstractFullDistribZkTestBase.destroyServers(AbstractFullDistribZkTestBase.java:1514) [junit4] 2> at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:964) [junit4] 2> at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) [junit4] 2> at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:875) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:777) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:822) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> ...at that point, things go from bad to worse, as RecoveryThreads kick in (why is the collection attempting to recover it it was deliberaly being shutdown???) but of course that doesn't turn out well because the CoreContainer's are all in shutdown mode... [junit4] 2> 69727 INFO (qtp1007998027-153) [n:127.0.0.1:44195_ ] o.a.s.h.a.CoreAdminHandler Going to wait for coreNodeName: core_node8, state: recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true [junit4] 2> 69727 INFO (zkCallback-44-thread-2-processing-n:127.0.0.1:42678_) [n:127.0.0.1:42678_ ] o.a.s.c.c.ZkStateReader Updating data for collection1 from 29 to 30 [junit4] 2> 69728 INFO (zkCallback-39-thread-2-processing-n:127.0.0.1:52349_) [n:127.0.0.1:52349_ ] o.a.s.c.c.ZkStateReader Updating data for collection1 from 29 to 30 [junit4] 2> 69728 INFO (qtp1007998027-153) [n:127.0.0.1:44195_ ] o.a.s.h.a.CoreAdminHandler Will wait a max of 183 seconds to see collection1 (shard1 of collection1) have state: recovering [junit4] 2> 69728 INFO (qtp1007998027-153) [n:127.0.0.1:44195_ ] o.a.s.h.a.CoreAdminHandler In WaitForState(recovering): collection=collection1, shard=shard1, thisCore=collection1, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=recovering, localState=active, nodeName=127.0.0.1:42678_, coreNodeName=core_node8, onlyIfActiveCheckResult=false, nodeProps: core_node8:{"core":"collection1","base_url":"http://127.0.0.1:42678","node_name":"127.0.0.1:42678_","state":"recovering"} [junit4] 2> 69729 INFO (qtp1007998027-153) [n:127.0.0.1:44195_ ] o.a.s.h.a.CoreAdminHandler Waited coreNodeName: core_node8, state: recovering, checkLive: true, onlyIfLeader: true for: 0 seconds. [junit4] 2> 69729 INFO (qtp1007998027-153) [n:127.0.0.1:44195_ ] o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/cores params={nodeName=127.0.0.1:42678_&onlyIfLeaderActive=true&core=collection1&coreNodeName=core_node8&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2} status=0 QTime=1 [junit4] HEARTBEAT J0 PID(17711@tray): 2015-11-11T16:22:31, stalled for 70.2s at: TestCloudSchemaless.test [junit4] 2> 72390 INFO (RecoveryThread-collection1) [n:127.0.0.1:44717_ c:collection1 s:shard3 r:core_node6 x:collection1] o.a.s.c.RecoveryStrategy Starting Replication Recovery. [junit4] 2> 72390 INFO (RecoveryThread-collection1) [n:127.0.0.1:44717_ c:collection1 s:shard3 r:core_node6 x:collection1] o.a.s.c.RecoveryStrategy Attempting to replicate from http://127.0.0.1:54475/collection1/. [junit4] 2> 72391 ERROR (qtp1661135858-94) [n:127.0.0.1:54475_ ] o.a.s.s.SolrDispatchFilter Error processing the request. CoreContainer is either not initialized or shutting down. [junit4] 2> 72392 WARN (qtp1661135858-94) [n:127.0.0.1:54475_ ] o.e.j.s.ServletHandler /collection1/update [junit4] 2> org.apache.solr.common.SolrException: Error processing the request. CoreContainer is either not initialized or shutting down. [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:187) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4] 2> at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4] 2> at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [junit4] 2> at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:300) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [junit4] 2> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [junit4] 2> at org.eclipse.jetty.server.Server.handle(Server.java:499) [junit4] 2> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) [junit4] 2> at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) [junit4] 2> at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [junit4] 2> at java.lang.Thread.run(Thread.java:745) ...and then things really get weird, as the RecoveryThread starts trying to do replication recovery, and then nodes start complaining that they can't open a new searcher because SolrCoreAware.inform() was never called on the SimilarityFactory (an assert that exists inside SchemaSimilarityFactory.getSimilarity()) ... [junit4] 2> 74329 INFO (RecoveryThread-collection1) [n:127.0.0.1:52349_ c:collection1 s:shard4 r:core_node7 x:collection1] o.a.s.c.RecoveryStrategy Attempting to replicate from http://127.0.0.1:54695/collection1/. [junit4] 2> 74331 INFO (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.u.UpdateHandler start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} [junit4] 2> 74359 INFO (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.c.SolrCore SolrDeletionPolicy.onCommit: commits: num=2 [junit4] 2> commit{dir=NRTCachingDirectory(MMapDirectory@/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-3-001/cores/collection1/data/index lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@5720ab9c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} [junit4] 2> commit{dir=NRTCachingDirectory(MMapDirectory@/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.schema.TestCloudSchemaless_E3CE46BC04E096EF-001/shard-3-001/cores/collection1/data/index lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@5720ab9c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_5,generation=5} [junit4] 2> 74359 INFO (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.c.SolrCore newest commit generation = 5 [junit4] 2> 74361 INFO (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.s.SolrIndexSearcher Opening Searcher@5428727f[collection1] realtime [junit4] 2> 74361 INFO (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.u.p.LogUpdateProcessor [collection1] webapp= path=/update params={waitSearcher=true&openSearcher=false&commit=true&softCommit=false&commit_end_point=true&wt=javabin&version=2} {} 0 30 [junit4] 2> 74362 ERROR (qtp1972553455-125) [n:127.0.0.1:54695_ c:collection1 s:shard4 r:core_node3 x:collection1] o.a.s.s.SolrDispatchFilter null:java.lang.RuntimeException: java.lang.AssertionError: inform must be called first [junit4] 2> at org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:616) [junit4] 2> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:475) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4] 2> at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4] 2> at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [junit4] 2> at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:300) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [junit4] 2> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [junit4] 2> at org.eclipse.jetty.server.Server.handle(Server.java:499) [junit4] 2> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) [junit4] 2> at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) [junit4] 2> at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.AssertionError: inform must be called first [junit4] 2> at org.apache.solr.search.similarities.SchemaSimilarityFactory.getSimilarity(SchemaSimilarityFactory.java:86) [junit4] 2> at org.apache.solr.schema.IndexSchema.getSimilarity(IndexSchema.java:269) [junit4] 2> at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:255) [junit4] 2> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1628) [junit4] 2> at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:635) [junit4] 2> at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) [junit4] 2> at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1616) [junit4] 2> at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1593) [junit4] 2> at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161) [junit4] 2> at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) [junit4] 2> at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) [junit4] 2> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:151) [junit4] 2> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2079) [junit4] 2> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:667) [junit4] 2> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460) [junit4] 2> ... 23 more ...and then eventually the suite times out. There are lots of questions here, notably: what makes this tests special that it fails so spectaculaly like this? in particular why does this SimilarityFactory change cause the directory ref counting to fail? why is recovery happening after shutdown? regardless of how/why the other things in the test go so bizare in the first place, what code path is leading to the SchemaSimilarityFactory that hasn't had .inform(SolrCore) called on it?
          Hide
          Hoss Man added a comment -

          After filing this, it occurred to me the original issue that lead me down this investigation (SOLR-8271) I was also seeing failures in ChangedSchemaMergeTest.testOptimizeDiffSchemas that i wasn't seeing with this patch. Looking into why I realized it's because that test uses it's own hardcoded schema(s) that are build on the fly from java code.

          In this new updated patch, i tweaked them to directly refer to SchemaSimilarityFactory, and now you can see the AssertionError: inform must be called first getting logged, and the subsequent cc.shutdown() giving up after a ref count mismatch on the directory factory.

          This is probably a simpler test to trouble shoot why the SolrCoreAware.inform isn't getting called properly then the cloud test, and hopefully can lead to a fix that helps with both tests.

          Show
          Hoss Man added a comment - After filing this, it occurred to me the original issue that lead me down this investigation ( SOLR-8271 ) I was also seeing failures in ChangedSchemaMergeTest.testOptimizeDiffSchemas that i wasn't seeing with this patch. Looking into why I realized it's because that test uses it's own hardcoded schema(s) that are build on the fly from java code. In this new updated patch, i tweaked them to directly refer to SchemaSimilarityFactory, and now you can see the AssertionError: inform must be called first getting logged, and the subsequent cc.shutdown() giving up after a ref count mismatch on the directory factory. This is probably a simpler test to trouble shoot why the SolrCoreAware.inform isn't getting called properly then the cloud test, and hopefully can lead to a fix that helps with both tests.
          Hide
          Hoss Man added a comment -

          The root problem seems to be that when using the SolrResourceLoader to create newInstances of objects, the loader is tracking what things are SolrCoreAware, ResourceLoaderAware, and/or SolrInfoMBean. Then, just before the SolrCore finishes initialiing itself, it calls a method on SolrResourceLoader to take appropriate action on to inform those instances (and/or add them to the MBean registry)

          The problem happens when any new instances are created by the SolrResourceLoader after the SolrCore is up and running – it currently has a live boolean it uses to just flat out ignore wether or not these instances are SolrCoreAware, ResourceLoaderAware, and/or SolrInfoMBean, meaning that nothing in the call stack ever informs them about the SolrCore.

          It looks like SOLR-4658 included a bit of a hack work arround for the ResourceLoaderAware schema elements (see IndexSchema's constructor which call's loader.inform(loader);...

          http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/IndexSchema.java?r1=1463182&r2=1463181&pathrev=1463182

          ...this seems realy sketchy because it causes any ResourceLoaderAware plugins inited so far by the core to be {{inform(ResourceLoader)}}ed once the first IndexSchema is created – even though that's not suppose to happen until mutch later in the SolrCore constructor just before the CountDownLatch is released.

          What it does do however is ensure that when a new schema gets loaded later (by the REST API, or a schemaless update processor) and ResourceLoaderAware fieldtypes/analyzers are good to go – but that doesn't do anything to help SolrCoreAware plugins like SimilarityFactory.

          I'm attaching a work in progress patch where I attempted to fix the underlying problem with SolrResourceLoader by having it keep a refrence to the SolrCore it's tied to such that any new instances after that the would be immediately informed of the SorlCore/ResourceLoader. This fixes some of the tests I mentioned before in this issue that have problems with SchemaSimilarityFactory but causes other failures in other existing test that reload the schema – because any FieldType that is ResourceLoader aware is now being "informed" of the loader as soon as it's instantiated – before even basic init() methods are called. Which makes sense in hind sight – my whole approach here is flawed because the contract is suppoes to be that the init methods will always be called first, and any (valid) inform methods will be called at some point after that once the core/loader is available, but before the instance is used ... calling "new" then "inform" then "init" is maddness.

          I honestly don't know if there is a sane way to solve this problem in the general case – the best thing I can think of at the moment is a similar special case hack for calling loader.inform(SolrCore) after any code that creates a schema (other then SolrCore)

          Show
          Hoss Man added a comment - The root problem seems to be that when using the SolrResourceLoader to create newInstances of objects, the loader is tracking what things are SolrCoreAware, ResourceLoaderAware, and/or SolrInfoMBean. Then, just before the SolrCore finishes initialiing itself, it calls a method on SolrResourceLoader to take appropriate action on to inform those instances (and/or add them to the MBean registry) The problem happens when any new instances are created by the SolrResourceLoader after the SolrCore is up and running – it currently has a live boolean it uses to just flat out ignore wether or not these instances are SolrCoreAware, ResourceLoaderAware, and/or SolrInfoMBean, meaning that nothing in the call stack ever informs them about the SolrCore. It looks like SOLR-4658 included a bit of a hack work arround for the ResourceLoaderAware schema elements (see IndexSchema's constructor which call's loader.inform(loader); ... http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/IndexSchema.java?r1=1463182&r2=1463181&pathrev=1463182 ...this seems realy sketchy because it causes any ResourceLoaderAware plugins inited so far by the core to be {{inform(ResourceLoader)}}ed once the first IndexSchema is created – even though that's not suppose to happen until mutch later in the SolrCore constructor just before the CountDownLatch is released. What it does do however is ensure that when a new schema gets loaded later (by the REST API, or a schemaless update processor) and ResourceLoaderAware fieldtypes/analyzers are good to go – but that doesn't do anything to help SolrCoreAware plugins like SimilarityFactory. I'm attaching a work in progress patch where I attempted to fix the underlying problem with SolrResourceLoader by having it keep a refrence to the SolrCore it's tied to such that any new instances after that the would be immediately informed of the SorlCore/ResourceLoader. This fixes some of the tests I mentioned before in this issue that have problems with SchemaSimilarityFactory but causes other failures in other existing test that reload the schema – because any FieldType that is ResourceLoader aware is now being "informed" of the loader as soon as it's instantiated – before even basic init() methods are called. Which makes sense in hind sight – my whole approach here is flawed because the contract is suppoes to be that the init methods will always be called first, and any (valid) inform methods will be called at some point after that once the core/loader is available, but before the instance is used ... calling "new" then "inform" then "init" is maddness. I honestly don't know if there is a sane way to solve this problem in the general case – the best thing I can think of at the moment is a similar special case hack for calling loader.inform(SolrCore) after any code that creates a schema (other then SolrCore)
          Hide
          Alan Woodward added a comment -

          I have a half-implemented patch hanging around somewhere that tried to clean this up a bit. I think the root problem is that there are two circumstances in which we're using SolrResourceLoader, a) during core initialization when we need to call init() immediately, but wait to call inform() until after the loading latch has been released, and then b) to create new objects once the core is up and serving queries. I tried to split this out into two separate SRL implementations, one of which is private to SolrCore and used only in the constructor, and does the call-init-and-then-delay-inform dance, and the other of which is returned by SolrCore.getResourceLoader() and inits() and informs() before it returns. To be honest though, I get so confused by the code paths here that I'm not sure whether or not that would help in this case...

          Show
          Alan Woodward added a comment - I have a half-implemented patch hanging around somewhere that tried to clean this up a bit. I think the root problem is that there are two circumstances in which we're using SolrResourceLoader, a) during core initialization when we need to call init() immediately, but wait to call inform() until after the loading latch has been released, and then b) to create new objects once the core is up and serving queries. I tried to split this out into two separate SRL implementations, one of which is private to SolrCore and used only in the constructor, and does the call-init-and-then-delay-inform dance, and the other of which is returned by SolrCore.getResourceLoader() and inits() and informs() before it returns. To be honest though, I get so confused by the code paths here that I'm not sure whether or not that would help in this case...
          Hide
          Hoss Man added a comment -

          I have a half-implemented patch hanging around somewhere that tried to clean this up a bit. I think the root problem is that there are two circumstances in which we're using SolrResourceLoader, ...

          Agreed. It's a mess, and it would be nice to clean up – but that's a huge pile of work, so i'd prefer to punt it to another issue. (which i will file soon)


          After digging into things a bit more, here are a few things i learned/realized/uncovered in no particular order...

          • in most cases, the schemaless/managed-schema code paths don't actaully replace the entire IndexSchema used by a SolrCore...
            • REST APIs use things like ManagedIndexSchema.addFieldTypes(...) which do shallow copies
              • these ManagedIndexSchema methods for mutating things and doing shallow copies already have smarts to ensure that any new ResourceAware objects get properly informed.
              • since there is no way to dynamically change the SimFactory at run time, the existing instance is re-used in all of these shallow copies and no new SimFactory instances ever need informed of the core
            • some (cloud specific) code paths use things like ZkIndexSchemaReader.updateSchema to notice if/when the schema file changes in ZK and act on that locally
              • this does (evidently) construct an entirely new ManagedIndexSchema instance
              • this is the code path that was execing after TestCloudSchemaless was finished – but I still understand when/why this was happening.
            • ChangedSchemaMergeTest is kind of a special case, because it goes out of it's way to construct a new IndexSchema and set it on an existing SolrCore even though it isn't using MangedIndexSchema
          • There was already a special kludge for SolrCoreAware SimFactories in SolrCore.initSchema
            • looks like this was originally for ensuring that the SimFactories was usable when other SolrCoreAware things (like listeners) got informed of the SolrCore and tried to use the SolrIndexSearcher (which depended on the sim)

          So i think the most straight forward solution to the problem (SimilarityFactory-ies that implement SolrCoreAware playing nice with managed schema) is to refactor that existing kludge from SolrCore.initSchema to SolrCore.setLatestSchema


          Current Patch...

          • schema-add-schema-fields-update-processor.xml - explicitly use SchemaSimilarityFactory here to help stress TestCloudSchemaless
          • ChangedSchemaMergeTest - explicitly use SchemaSimilarityFactory here to test that scenerio
          • SolrCore - refactored existing SolrCoreAware simfactory hack so that it applies anytime setLatestSchema is called
          • SchemaSimilarityFactory - switched from assertions to IllegalStateException so it's more obvious there's a problem even if assertions are disabled (no NPE)
          • SolrResourceLoader - has some nocommits i want to update with strong warnings and a link to a new jira where my & alan's comments about the lifecycle problems of objects inited after the SolrCore is loaded are tracked

          TODO:

          • cleanup & beef up nocommit comments
          • beef up ChangedSchemaMergeTest to actually change the sim used in each schema & verify it's updated (and fully functional)
          • add/update a managed schema test that does an add-field type w/ a per-fieldtype sim and sanity check that code path + input works properly and plays nicely with SchemaSimilarityFactory
          Show
          Hoss Man added a comment - I have a half-implemented patch hanging around somewhere that tried to clean this up a bit. I think the root problem is that there are two circumstances in which we're using SolrResourceLoader, ... Agreed. It's a mess, and it would be nice to clean up – but that's a huge pile of work, so i'd prefer to punt it to another issue. (which i will file soon) After digging into things a bit more, here are a few things i learned/realized/uncovered in no particular order... in most cases, the schemaless/managed-schema code paths don't actaully replace the entire IndexSchema used by a SolrCore... REST APIs use things like ManagedIndexSchema.addFieldTypes(...) which do shallow copies these ManagedIndexSchema methods for mutating things and doing shallow copies already have smarts to ensure that any new ResourceAware objects get properly informed. since there is no way to dynamically change the SimFactory at run time, the existing instance is re-used in all of these shallow copies and no new SimFactory instances ever need informed of the core some (cloud specific) code paths use things like ZkIndexSchemaReader.updateSchema to notice if/when the schema file changes in ZK and act on that locally this does (evidently) construct an entirely new ManagedIndexSchema instance this is the code path that was execing after TestCloudSchemaless was finished – but I still understand when/why this was happening. ChangedSchemaMergeTest is kind of a special case, because it goes out of it's way to construct a new IndexSchema and set it on an existing SolrCore even though it isn't using MangedIndexSchema There was already a special kludge for SolrCoreAware SimFactories in SolrCore.initSchema looks like this was originally for ensuring that the SimFactories was usable when other SolrCoreAware things (like listeners) got informed of the SolrCore and tried to use the SolrIndexSearcher (which depended on the sim) So i think the most straight forward solution to the problem (SimilarityFactory-ies that implement SolrCoreAware playing nice with managed schema) is to refactor that existing kludge from SolrCore.initSchema to SolrCore.setLatestSchema Current Patch... schema-add-schema-fields-update-processor.xml - explicitly use SchemaSimilarityFactory here to help stress TestCloudSchemaless ChangedSchemaMergeTest - explicitly use SchemaSimilarityFactory here to test that scenerio SolrCore - refactored existing SolrCoreAware simfactory hack so that it applies anytime setLatestSchema is called SchemaSimilarityFactory - switched from assertions to IllegalStateException so it's more obvious there's a problem even if assertions are disabled (no NPE) SolrResourceLoader - has some nocommits i want to update with strong warnings and a link to a new jira where my & alan's comments about the lifecycle problems of objects inited after the SolrCore is loaded are tracked TODO: cleanup & beef up nocommit comments beef up ChangedSchemaMergeTest to actually change the sim used in each schema & verify it's updated (and fully functional) add/update a managed schema test that does an add-field type w/ a per-fieldtype sim and sanity check that code path + input works properly and plays nicely with SchemaSimilarityFactory
          Hide
          Hoss Man added a comment -

          New in this patch...

          • cleanup & beef up nocommit comments to point to new SOLR-8311 trakcing jira
          • beefed up ChangedSchemaMergeTest to actually change the sim used in each schema & verify it's updated (and fully functional)
          • put some sanity checks in TestBulkSchemaAPI.testMultipleCommands
            • already had some basic verification that adding a fieldtype w/sim + field using that type workd
            • now it whitebox verifies that the the underlying SimilarityFactory for the latest schema is and returns the expected Sim for each field.

          ...still testing, but i think this is good to go.

          Show
          Hoss Man added a comment - New in this patch... cleanup & beef up nocommit comments to point to new SOLR-8311 trakcing jira beefed up ChangedSchemaMergeTest to actually change the sim used in each schema & verify it's updated (and fully functional) put some sanity checks in TestBulkSchemaAPI.testMultipleCommands already had some basic verification that adding a fieldtype w/sim + field using that type workd now it whitebox verifies that the the underlying SimilarityFactory for the latest schema is and returns the expected Sim for each field. ...still testing, but i think this is good to go.
          Hide
          ASF subversion and git services added a comment -

          Commit 1715215 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1715215 ]

          SOLR-8280: Fixed bug in SimilarityFactory initialization that prevented SolrCoreAware factories from functioning properly with managed schema features

          Show
          ASF subversion and git services added a comment - Commit 1715215 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1715215 ] SOLR-8280 : Fixed bug in SimilarityFactory initialization that prevented SolrCoreAware factories from functioning properly with managed schema features
          Hide
          ASF subversion and git services added a comment -

          Commit 1715225 from hossman@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1715225 ]

          SOLR-8280: Fixed bug in SimilarityFactory initialization that prevented SolrCoreAware factories from functioning properly with managed schema features (merge r1715215)

          Show
          ASF subversion and git services added a comment - Commit 1715225 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1715225 ] SOLR-8280 : Fixed bug in SimilarityFactory initialization that prevented SolrCoreAware factories from functioning properly with managed schema features (merge r1715215)

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development