Solr
  1. Solr
  2. SOLR-7081

create/delete/create collection (new test case)

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2
    • Component/s: None
    • Labels:
      None

      Description

      Unexpectedly the second collection create fails (saying that the collection already exists) despite the collection delete having apparently succeeded.

      Collection create/delete/create is probably an uncommon operational sequence but perhaps the test failure indicates that something unexpected is happening elsewhere.

      github pull request and test log extracts to follow.

        Issue Links

          Activity

          Hide
          ASF GitHub Bot added a comment -

          GitHub user cpoerschke opened a pull request:

          https://github.com/apache/lucene-solr/pull/127

          SOLR-7081: create/delete/create collection (new test case)

          https://issues.apache.org/jira/i#browse/SOLR-7081

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr trunk-create-delete-create-collection

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/127.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #127


          commit 24e87d6b3e180ce644acfd1896e43cdcb512a4be
          Author: Christine Poerschke <cpoerschke@bloomberg.net>
          Date: 2015-01-21T10:14:38Z

          SOLR-????: TestMiniSolrCloudCluster.testBasics tidies up after itself, adds DoubleTestMiniSolrCloudCluster test case.

          TestMiniSolrCloudCluster.testBasics now re-creates the server it removed for test purposes, thus restoring the original NUM_SERVERS count. TestMiniSolrCloudCluster.testBasics now also deletes the collection it created for test purposes (this revision adds a MiniSolrCloudCluster.deleteCollection method).

          DoubleTestMiniSolrCloudCluster is a new test case. DoubleTestMiniSolrCloudCluster.testBasics calls TestMiniSolrCloudCluster.testBasics twice in a row.


          Show
          ASF GitHub Bot added a comment - GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/127 SOLR-7081 : create/delete/create collection (new test case) https://issues.apache.org/jira/i#browse/SOLR-7081 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-create-delete-create-collection Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #127 commit 24e87d6b3e180ce644acfd1896e43cdcb512a4be Author: Christine Poerschke <cpoerschke@bloomberg.net> Date: 2015-01-21T10:14:38Z SOLR-????: TestMiniSolrCloudCluster.testBasics tidies up after itself, adds DoubleTestMiniSolrCloudCluster test case. TestMiniSolrCloudCluster.testBasics now re-creates the server it removed for test purposes, thus restoring the original NUM_SERVERS count. TestMiniSolrCloudCluster.testBasics now also deletes the collection it created for test purposes (this revision adds a MiniSolrCloudCluster.deleteCollection method). DoubleTestMiniSolrCloudCluster is a new test case. DoubleTestMiniSolrCloudCluster.testBasics calls TestMiniSolrCloudCluster.testBasics twice in a row.
          Hide
          Christine Poerschke added a comment -

          Here's extract of interesting things from the test (`ant test -Dtestcase=DoubleTestMiniSolrCloudCluster`) output:

             [junit4]   2> 37011 T207 oasc.SolrException.log ERROR Failed to delete instance dir for core:testSolrCloudCollection_shard1_replica2 dir:/mydirectory/solr/build/solr-core/test/J0/temp/solr.cloud.DoubleTestMiniSolrCloudCluster A33CCC8883EFD522-001/tempDir-001/./testSolrCloudCollection_shard1_replica2
             [junit4]   2> 37012 T207 oasc.ElectionContext.cancelElection canceling election /collections/testSolrCloudCollection/leader_elect/shard1/election/93266847936610328-core_node2-n_0000000003
             ...
             [junit4]   2> 37024 T206 oasc.SolrException.log ERROR Failed to delete instance dir for core:testSolrCloudCollection_shard1_replica1 dir:/mydirectory/solr/build/solr-core/test/J0/temp/solr.cloud.DoubleTestMiniSolrCloudCluster A33CCC8883EFD522-001/tempDir-001/./testSolrCloudCollection_shard1_replica1
             [junit4]   2> 37024 T206 oasc.ElectionContext.cancelElection canceling election /collections/testSolrCloudCollection/leader_elect/shard1/election/93266847936610328-core_node3-n_0000000002
          

          Some errors deleting the instance directory (on T206 and T207).

             [junit4]   2> 37677 T13 oasc.TestMiniSolrCloudCluster.waitForCollectionToDisappear Wait for collection to disappear - collection: testSolrCloudCollection failOnTimeout:true timeout (sec):330
             ...
             [junit4]   2> 37679 T13 oasc.TestMiniSolrCloudCluster.waitForCollectionToDisappear Collection has disappeared - collection: testSolrCloudCollection
          

          But the collection is being reported as having disappeared (on T13).

             [junit4]   2> 37710 T13 oasu.DefaultSolrCoreState.closeIndexWriter closing IndexWriter with IndexWriterCloser
             [junit4]   2> 37709 T212 oasco.ClusterStateMutator.createCollection building a new cName: testSolrCloudCollection
             [junit4]   2> 37716 T13 oasc.SolrCore.closeSearcher [testSolrCloudCollection_shard2_replica2] Closing main searcher on request.
          

          Though on T13 there are also still traces of shard2 replica still being around (after the reported disappearance of the collection). Note that this is shard2 and the deleting errors earlier were for shard1. At this point T212 is beginning the second create operation.

          Now on T264 some replaying of operations (delete sub-operations?).

             [junit4]   2> 37793 T264 oasc.Overseer$ClusterStateUpdater.run Replaying operations from work queue.
             [junit4]   2> 37794 T264 oasc.Overseer$ClusterStateUpdater.run processMessage: queueSize: 0, message = {
             [junit4]   2> 	  "core":"testSolrCloudCollection_shard2_replica2",
             [junit4]   2> 	  "core_node_name":"core_node4",
             [junit4]   2> 	  "roles":null,
             [junit4]   2> 	  "base_url":"http://127.0.0.1:55554/solr",
             [junit4]   2> 	  "node_name":"127.0.0.1:55554_solr",
             [junit4]   2> 	  "numShards":"2",
             [junit4]   2> 	  "state":"down",
             [junit4]   2> 	  "shard":"shard2",
             [junit4]   2> 	  "collection":"testSolrCloudCollection",
             [junit4]   2> 	  "operation":"state"}
             [junit4]   2> 37795 T264 oasco.ReplicaMutator.updateState Update state numShards=2 message={
             [junit4]   2> 	  "core":"testSolrCloudCollection_shard2_replica2",
             [junit4]   2> 	  "core_node_name":"core_node4",
             [junit4]   2> 	  "roles":null,
             [junit4]   2> 	  "base_url":"http://127.0.0.1:55554/solr",
             [junit4]   2> 	  "node_name":"127.0.0.1:55554_solr",
             [junit4]   2> 	  "numShards":"2",
             [junit4]   2> 	  "state":"down",
             [junit4]   2> 	  "shard":"shard2",
             [junit4]   2> 	  "collection":"testSolrCloudCollection",
             [junit4]   2> 	  "operation":"state"}
             [junit4]   2> 37796 T264 oasco.ClusterStateMutator.createCollection building a new cName: testSolrCloudCollection
          

          Following the replay the second collection create progresses on T264.

             [junit4]   2> 41121 T280 oasc.OverseerCollectionProcessor.processMessage WARN OverseerCollectionProcessor.processMessage : create , {
             [junit4]   2> 	  "operation":"create",
             [junit4]   2> 	  "fromApi":"true",
             [junit4]   2> 	  "name":"testSolrCloudCollection",
             [junit4]   2> 	  "replicationFactor":"2",
             [junit4]   2> 	  "collection.configName":"solrCloudCollectionConfig",
             [junit4]   2> 	  "numShards":"2",
             [junit4]   2> 	  "stateFormat":"2",
             [junit4]   2> 	  "property.solr.tests.ramBufferSizeMB":"100",
             [junit4]   2> 	  "property.solr.tests.maxIndexingThreads":"-1",
             [junit4]   2> 	  "property.solr.tests.mergeScheduler":"org.apache.lucene.index.ConcurrentMergeScheduler",
             [junit4]   2> 	  "property.config":"solrconfig-tlog.xml",
             [junit4]   2> 	  "property.solr.tests.maxBufferedDocs":"100000",
             [junit4]   2> 	  "property.solr.tests.mergePolicy":"org.apache.lucene.index.TieredMergePolicy",
             [junit4]   2> 	  "property.solr.directoryFactory":"solr.RAMDirectoryFactory"}
             ...
             [junit4]   2> 41122 T280 oasc.SolrException.log ERROR Collection: testSolrCloudCollection operation: create failed:org.apache.solr.common.SolrException: collection already exists: testSolrCloudCollection
             [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:2314)
             [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:605)
             [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor$Runner.run(OverseerCollectionProcessor.java:2875)
             [junit4]   2> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
             [junit4]   2> 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
             [junit4]   2> 		at java.lang.Thread.run(Thread.java:745)
          

          But then on T280 the second collection create fails.

          Show
          Christine Poerschke added a comment - Here's extract of interesting things from the test (`ant test -Dtestcase=DoubleTestMiniSolrCloudCluster`) output: [junit4] 2> 37011 T207 oasc.SolrException.log ERROR Failed to delete instance dir for core:testSolrCloudCollection_shard1_replica2 dir:/mydirectory/solr/build/solr-core/test/J0/temp/solr.cloud.DoubleTestMiniSolrCloudCluster A33CCC8883EFD522-001/tempDir-001/./testSolrCloudCollection_shard1_replica2 [junit4] 2> 37012 T207 oasc.ElectionContext.cancelElection canceling election /collections/testSolrCloudCollection/leader_elect/shard1/election/93266847936610328-core_node2-n_0000000003 ... [junit4] 2> 37024 T206 oasc.SolrException.log ERROR Failed to delete instance dir for core:testSolrCloudCollection_shard1_replica1 dir:/mydirectory/solr/build/solr-core/test/J0/temp/solr.cloud.DoubleTestMiniSolrCloudCluster A33CCC8883EFD522-001/tempDir-001/./testSolrCloudCollection_shard1_replica1 [junit4] 2> 37024 T206 oasc.ElectionContext.cancelElection canceling election /collections/testSolrCloudCollection/leader_elect/shard1/election/93266847936610328-core_node3-n_0000000002 Some errors deleting the instance directory (on T206 and T207). [junit4] 2> 37677 T13 oasc.TestMiniSolrCloudCluster.waitForCollectionToDisappear Wait for collection to disappear - collection: testSolrCloudCollection failOnTimeout: true timeout (sec):330 ... [junit4] 2> 37679 T13 oasc.TestMiniSolrCloudCluster.waitForCollectionToDisappear Collection has disappeared - collection: testSolrCloudCollection But the collection is being reported as having disappeared (on T13). [junit4] 2> 37710 T13 oasu.DefaultSolrCoreState.closeIndexWriter closing IndexWriter with IndexWriterCloser [junit4] 2> 37709 T212 oasco.ClusterStateMutator.createCollection building a new cName: testSolrCloudCollection [junit4] 2> 37716 T13 oasc.SolrCore.closeSearcher [testSolrCloudCollection_shard2_replica2] Closing main searcher on request. Though on T13 there are also still traces of shard2 replica still being around (after the reported disappearance of the collection). Note that this is shard2 and the deleting errors earlier were for shard1. At this point T212 is beginning the second create operation. Now on T264 some replaying of operations (delete sub-operations?). [junit4] 2> 37793 T264 oasc.Overseer$ClusterStateUpdater.run Replaying operations from work queue. [junit4] 2> 37794 T264 oasc.Overseer$ClusterStateUpdater.run processMessage: queueSize: 0, message = { [junit4] 2> "core" : "testSolrCloudCollection_shard2_replica2" , [junit4] 2> "core_node_name" : "core_node4" , [junit4] 2> "roles" : null , [junit4] 2> "base_url" : "http: //127.0.0.1:55554/solr" , [junit4] 2> "node_name" : "127.0.0.1:55554_solr" , [junit4] 2> "numShards" : "2" , [junit4] 2> "state" : "down" , [junit4] 2> "shard" : "shard2" , [junit4] 2> "collection" : "testSolrCloudCollection" , [junit4] 2> "operation" : "state" } [junit4] 2> 37795 T264 oasco.ReplicaMutator.updateState Update state numShards=2 message={ [junit4] 2> "core" : "testSolrCloudCollection_shard2_replica2" , [junit4] 2> "core_node_name" : "core_node4" , [junit4] 2> "roles" : null , [junit4] 2> "base_url" : "http: //127.0.0.1:55554/solr" , [junit4] 2> "node_name" : "127.0.0.1:55554_solr" , [junit4] 2> "numShards" : "2" , [junit4] 2> "state" : "down" , [junit4] 2> "shard" : "shard2" , [junit4] 2> "collection" : "testSolrCloudCollection" , [junit4] 2> "operation" : "state" } [junit4] 2> 37796 T264 oasco.ClusterStateMutator.createCollection building a new cName: testSolrCloudCollection Following the replay the second collection create progresses on T264. [junit4] 2> 41121 T280 oasc.OverseerCollectionProcessor.processMessage WARN OverseerCollectionProcessor.processMessage : create , { [junit4] 2> "operation" : "create" , [junit4] 2> "fromApi" : " true " , [junit4] 2> "name" : "testSolrCloudCollection" , [junit4] 2> "replicationFactor" : "2" , [junit4] 2> "collection.configName" : "solrCloudCollectionConfig" , [junit4] 2> "numShards" : "2" , [junit4] 2> "stateFormat" : "2" , [junit4] 2> "property.solr.tests.ramBufferSizeMB" : "100" , [junit4] 2> "property.solr.tests.maxIndexingThreads" : "-1" , [junit4] 2> "property.solr.tests.mergeScheduler" : "org.apache.lucene.index.ConcurrentMergeScheduler" , [junit4] 2> "property.config" : "solrconfig-tlog.xml" , [junit4] 2> "property.solr.tests.maxBufferedDocs" : "100000" , [junit4] 2> "property.solr.tests.mergePolicy" : "org.apache.lucene.index.TieredMergePolicy" , [junit4] 2> "property.solr.directoryFactory" : "solr.RAMDirectoryFactory" } ... [junit4] 2> 41122 T280 oasc.SolrException.log ERROR Collection: testSolrCloudCollection operation: create failed:org.apache.solr.common.SolrException: collection already exists: testSolrCloudCollection [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:2314) [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:605) [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor$Runner.run(OverseerCollectionProcessor.java:2875) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2> at java.lang. Thread .run( Thread .java:745) But then on T280 the second collection create fails.
          Hide
          Mark Miller added a comment -

          I thought some test (like the collections api test) actually did this type of thing. Perhaps it's different somehow or I am remembering wrong. In either case, new testing always appreciated. Perhaps this leads to the root cause of some random fails I've seen where you surprisingly get this error.

          Show
          Mark Miller added a comment - I thought some test (like the collections api test) actually did this type of thing. Perhaps it's different somehow or I am remembering wrong. In either case, new testing always appreciated. Perhaps this leads to the root cause of some random fails I've seen where you surprisingly get this error.
          Hide
          Mark Miller added a comment - - edited

          Hmm...on first glance this looks like 'zk should be the truth' issue stuff. I really wanted to get a better start on that in for 5.0. Alas.

          We should almost just release note not to count on auto core creation in Solr 5 so that we can fix this stuff by default without an option before 6.

          Show
          Mark Miller added a comment - - edited Hmm...on first glance this looks like 'zk should be the truth' issue stuff. I really wanted to get a better start on that in for 5.0. Alas. We should almost just release note not to count on auto core creation in Solr 5 so that we can fix this stuff by default without an option before 6.
          Hide
          Ramkumar Aiyengar added a comment - - edited

          We should almost just release note not to count on auto core creation in Solr 5 so that we can fix this stuff by default without an option before 6.

          +1

          Show
          Ramkumar Aiyengar added a comment - - edited We should almost just release note not to count on auto core creation in Solr 5 so that we can fix this stuff by default without an option before 6. +1
          Hide
          Mark Miller added a comment -

          What do we put?

          Solr 5.0 only supports creating and removing SolrCloud collections through the collections API, unlike previous versions. While not using the collections API may still work in 5.0, it is unsupported, not recommended, and the behavior will change in a 5.x release.

          Show
          Mark Miller added a comment - What do we put? Solr 5.0 only supports creating and removing SolrCloud collections through the collections API, unlike previous versions. While not using the collections API may still work in 5.0, it is unsupported, not recommended, and the behavior will change in a 5.x release.
          Hide
          Christine Poerschke added a comment -

          https://github.com/apache/lucene-solr/pull/127 now rebased against latest trunk and the create/delete/create collection test case now passes.

          Show
          Christine Poerschke added a comment - https://github.com/apache/lucene-solr/pull/127 now rebased against latest trunk and the create/delete/create collection test case now passes.
          Hide
          Ramkumar Aiyengar added a comment -

          This seems to have been somehow fixed in trunk now, I just beasted the new test ten times and it works fine. I will commit this, and we can take it up as Jenkins fails..

          Show
          Ramkumar Aiyengar added a comment - This seems to have been somehow fixed in trunk now, I just beasted the new test ten times and it works fine. I will commit this, and we can take it up as Jenkins fails..
          Hide
          ASF subversion and git services added a comment -

          Commit 1675590 from Ramkumar Aiyengar in branch 'dev/trunk'
          [ https://svn.apache.org/r1675590 ]

          SOLR-7081: TestMiniSolrCloudCluster.testBasics tidies up after itself, adds collection create/delete/create test case.

          TestMiniSolrCloudCluster.testBasics now re-creates the server it removed for test purposes,
          thus restoring the original NUM_SERVERS count. TestMiniSolrCloudCluster.testBasics now also deletes
          the collection it created for test purposes (this revision adds MiniSolrCloudCluster.deleteCollection
          and AbstractDistribZkTestBase.waitForCollectionToDisappear methods).

          Sometimes TestMiniSolrCloudCluster.testBasics runs its create-collection/search-collection/delete-collection
          logic twice, thus creating a create/delete/create-collection test case.

          Show
          ASF subversion and git services added a comment - Commit 1675590 from Ramkumar Aiyengar in branch 'dev/trunk' [ https://svn.apache.org/r1675590 ] SOLR-7081 : TestMiniSolrCloudCluster.testBasics tidies up after itself, adds collection create/delete/create test case. TestMiniSolrCloudCluster.testBasics now re-creates the server it removed for test purposes, thus restoring the original NUM_SERVERS count. TestMiniSolrCloudCluster.testBasics now also deletes the collection it created for test purposes (this revision adds MiniSolrCloudCluster.deleteCollection and AbstractDistribZkTestBase.waitForCollectionToDisappear methods). Sometimes TestMiniSolrCloudCluster.testBasics runs its create-collection/search-collection/delete-collection logic twice, thus creating a create/delete/create-collection test case.
          Hide
          ASF subversion and git services added a comment -

          Commit 1676024 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1676024 ]

          SOLR-7081: TestMiniSolrCloudCluster.testBasics tidies up after itself, adds collection create/delete/create test case.

          TestMiniSolrCloudCluster.testBasics now re-creates the server it removed for test purposes,
          thus restoring the original NUM_SERVERS count. TestMiniSolrCloudCluster.testBasics now also deletes
          the collection it created for test purposes (this revision adds MiniSolrCloudCluster.deleteCollection
          and AbstractDistribZkTestBase.waitForCollectionToDisappear methods).

          Sometimes TestMiniSolrCloudCluster.testBasics runs its create-collection/search-collection/delete-collection
          logic twice, thus creating a create/delete/create-collection test case.

          Show
          ASF subversion and git services added a comment - Commit 1676024 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1676024 ] SOLR-7081 : TestMiniSolrCloudCluster.testBasics tidies up after itself, adds collection create/delete/create test case. TestMiniSolrCloudCluster.testBasics now re-creates the server it removed for test purposes, thus restoring the original NUM_SERVERS count. TestMiniSolrCloudCluster.testBasics now also deletes the collection it created for test purposes (this revision adds MiniSolrCloudCluster.deleteCollection and AbstractDistribZkTestBase.waitForCollectionToDisappear methods). Sometimes TestMiniSolrCloudCluster.testBasics runs its create-collection/search-collection/delete-collection logic twice, thus creating a create/delete/create-collection test case.
          Hide
          Ramkumar Aiyengar added a comment -

          Thanks Christine..

          Show
          Ramkumar Aiyengar added a comment - Thanks Christine..
          Hide
          Anshum Gupta added a comment -

          Bulk close for 5.2.0.

          Show
          Anshum Gupta added a comment - Bulk close for 5.2.0.
          Hide
          ASF GitHub Bot added a comment -

          Github user cpoerschke closed the pull request at:

          https://github.com/apache/lucene-solr/pull/127

          Show
          ASF GitHub Bot added a comment - Github user cpoerschke closed the pull request at: https://github.com/apache/lucene-solr/pull/127

            People

            • Assignee:
              Ramkumar Aiyengar
              Reporter:
              Christine Poerschke
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development