Solr
  1. Solr
  2. SOLR-4075

Upon removing the last core of a shard, the shard is not removed from the cluster state.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: SolrCloud
    • Labels:

      Description

      Simple Repro of this issue: which now I thought was related to the decision made in: https://issues.apache.org/jira/browse/SOLR-3080 but Mark tells me might have been a problem during the zk layout refactoring right before 4.0.

      1. Download SOLR 4 production and extract.
      2. Replace solr.xml in apache-solr-4.0.0/example/solr/solr.xml with:

      <?xml version="1.0" encoding="UTF-8" ?>
      <solr persistent="true">
      <cores adminPath="/admin/cores" defaultCoreName="collection1" host="$

      {host:}

      " hostPort="$

      {jetty.port:}

      " hostContext="$

      {hostContext:}

      " zkClientTimeout="$

      {zkClientTimeout:15000}

      ">
      <core shard="shard1" instanceDir="collection1/" name="collection1" collection="polecat"/>
      <core shard="shard1" instanceDir="collection2/" name="collection2" collection="polecat"/>
      <core schema="schema.xml" shard="core3" instanceDir="core3/" name="core3" config="solrconfig.xml" collection="polecat" dataDir="data"/>
      </cores>
      </solr>

      3. Start solr with: java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -Dsolrcloud.skip.autorecovery=true -jar start.jar
      (skip.autorecovery is used because the shards don't exist previously)

      Then run this:
      Sanity query: http://localhost:8983/solr/polecat/select?q=*%3A*&wt=xml&distrib=true
      Remove the core: http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core3&deleteIndex=true
      Error query: http://localhost:8983/solr/polecat/select?q=*%3A*&wt=xml&distrib=true

      And the sanity query, we will receive 0 records, the error query "no servers hosting shard:". And in the clusterstate.json: "core3":{"replicas":{}}}}

        Activity

        Hide
        Mark Miller added a comment -

        Hmm - well, adding a quick test that creates 6 cores in a collection and then unloads them shows the collection go away in clusterstate.json. So something interesting must be happening here...

        Show
        Mark Miller added a comment - Hmm - well, adding a quick test that creates 6 cores in a collection and then unloads them shows the collection go away in clusterstate.json. So something interesting must be happening here...
        Hide
        Gilles Comeau added a comment -

        Agree.. I've done a similar test using:

        SOLR.XML ->
        <?xml version="1.0" encoding="UTF-8" ?>
        <solr persistent="true">
        <cores adminPath="/admin/cores" defaultCoreName="collection1" host="$

        {host:}

        " hostPort="$

        {jetty.port:}

        " hostContext="$

        {hostContext:}

        " zkClientTimeout="$

        {zkClientTimeout:15000}

        ">
        <core shard="shard1" instanceDir="collection1/" name="collection1" collection="polecat"/>
        <core shard="shard1" instanceDir="collection2/" name="collection2" collection="polecat"/>
        <core schema="schema.xml" shard="core3" instanceDir="core3/" name="core3" config="solrconfig.xml" collection="polecat" dataDir="data"/>
        <core schema="schema.xml" shard="core4" instanceDir="core4/" name="core4" config="solrconfig.xml" collection="polecat2" dataDir="data"/>
        <core schema="schema.xml" shard="core5" instanceDir="core5/" name="core5" config="solrconfig.xml" collection="polecat2" dataDir="data"/>
        </cores>
        </solr>

        and

        http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core4&deleteIndex=true
        http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core5&deleteIndex=true

        and the polecat2 collection is removed with core5.

        I do get the "no servers hosting shard:" error after removing core4 and before removing core5.

        So it's a "removing last core does not remove shard" while "removing last core removes collection" is working fine?

        Show
        Gilles Comeau added a comment - Agree.. I've done a similar test using: SOLR.XML -> <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="$ {host:} " hostPort="$ {jetty.port:} " hostContext="$ {hostContext:} " zkClientTimeout="$ {zkClientTimeout:15000} "> <core shard="shard1" instanceDir="collection1/" name="collection1" collection="polecat"/> <core shard="shard1" instanceDir="collection2/" name="collection2" collection="polecat"/> <core schema="schema.xml" shard="core3" instanceDir="core3/" name="core3" config="solrconfig.xml" collection="polecat" dataDir="data"/> <core schema="schema.xml" shard="core4" instanceDir="core4/" name="core4" config="solrconfig.xml" collection="polecat2" dataDir="data"/> <core schema="schema.xml" shard="core5" instanceDir="core5/" name="core5" config="solrconfig.xml" collection="polecat2" dataDir="data"/> </cores> </solr> and http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core4&deleteIndex=true http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core5&deleteIndex=true and the polecat2 collection is removed with core5. I do get the "no servers hosting shard:" error after removing core4 and before removing core5. So it's a "removing last core does not remove shard" while "removing last core removes collection" is working fine?
        Hide
        Mark Miller added a comment -

        Ah, yes, I bet that is it.

        Show
        Mark Miller added a comment - Ah, yes, I bet that is it.
        Hide
        Mark Miller added a comment -

        Here is a first patch with a fix - test is visual - I have to add some checks to it.

        Show
        Mark Miller added a comment - Here is a first patch with a fix - test is visual - I have to add some checks to it.
        Hide
        Gilles Comeau added a comment -

        This worked perfectly for us in test today, and we're going to put it into production quite shortly. Thank you Mark!

        Show
        Gilles Comeau added a comment - This worked perfectly for us in test today, and we're going to put it into production quite shortly. Thank you Mark!
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1411450

        SOLR-4075: A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state.

        Show
        Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1411450 SOLR-4075 : A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1411451

        SOLR-4075: A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1411451 SOLR-4075 : A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state.
        Hide
        Mark Miller added a comment -

        Thanks for the detailed report Gilles!

        Show
        Mark Miller added a comment - Thanks for the detailed report Gilles!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1411451

        SOLR-4075: A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1411451 SOLR-4075 : A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Gilles Comeau
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development