Solr
  1. Solr
  2. SOLR-7172

addreplica API fails with incorrect error msg "cannot create collection"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.10.3, 5.0
    • Fix Version/s: 5.3, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Steps to reproduce:

      1. Create 1 node solr cloud cluster
      2. Create collection 'test' with numShards=1&replicationFactor=1&maxShardsPerNode=1
      3. Call addreplica API:
        http://localhost:8983/solr/admin/collections?action=addreplica&collection=test&shard=shard1&wt=json 
        

      API fails with the following response:

      {
      responseHeader: {
      status: 400,
      QTime: 9
      },
      Operation ADDREPLICA caused exception:: "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Cannot create collection test. No live Solr-instances",
      exception: {
      msg: "Cannot create collection test. No live Solr-instances",
      rspCode: 400
      },
      error: {
      msg: "Cannot create collection test. No live Solr-instances",
      code: 400
      }
      }
      
      1. SOLR-7172.patch
        29 kB
        Erick Erickson
      2. SOLR-7172.patch
        28 kB
        Erick Erickson
      3. SOLR-7172.patch
        10 kB
        Erick Erickson

        Activity

        Hide
        Hoss Man added a comment -

        i'm confused by the problem statement here.

        is the problem that there is a bug in addreplica which needs fixed?, or is the problem that when addreplica fails, it fails with a confusion/misleading error message about creating a collection?

        Show
        Hoss Man added a comment - i'm confused by the problem statement here. is the problem that there is a bug in addreplica which needs fixed?, or is the problem that when addreplica fails, it fails with a confusion/misleading error message about creating a collection?
        Hide
        Shalin Shekhar Mangar added a comment -

        is the problem that there is a bug in addreplica which needs fixed?, or is the problem that when addreplica fails, it fails with a confusion/misleading error message about creating a collection?

        It's the latter. The error message is wrong/confusing.

        Show
        Shalin Shekhar Mangar added a comment - is the problem that there is a bug in addreplica which needs fixed?, or is the problem that when addreplica fails, it fails with a confusion/misleading error message about creating a collection? It's the latter. The error message is wrong/confusing.
        Hide
        Erick Erickson added a comment -

        Shalin Shekhar Mangar I happen to be in this code for another JIRA, if I find this should I just fix it or are you already working on it?

        Show
        Erick Erickson added a comment - Shalin Shekhar Mangar I happen to be in this code for another JIRA, if I find this should I just fix it or are you already working on it?
        Hide
        Shalin Shekhar Mangar added a comment -

        Please go ahead!

        Show
        Shalin Shekhar Mangar added a comment - Please go ahead!
        Hide
        Erick Erickson added a comment -

        Shalin Shekhar Mangar Noble Paul Pinging you two since you've been into Assign more recently than I have and this looks bogus. But then again it's late and I suspect I'm missing something obvious. So before I dive into this am I off base?

        Anyway, Assign.getNodesForNewShard doesn't make sense to me. First of all, it's only called from CREATESHARD and ADDREPLICA. It looks like copy/paste from CREATE though, and some assumptions just don't seem to work. The error message about "cannot create collection" is totally bogus since this is never called from CREATE. (thus this JIRA, which is much more serious I think than a wonky error message).

        Anyway, part of the problem is in the calculations around line 208:

            int maxCoresAllowedToCreate = maxShardsPerNode * nodeList.size();
            int requestedCoresToCreate = numSlices * repFactor;
            int minCoresToCreate = requestedCoresToCreate;
            if (maxCoresAllowedToCreate < minCoresToCreate) { throw long, complex error}
        

        In these two operations, this doesn't take into account the replicas for the collection already on the nodes in nodeList. It seems to me that nodeList is the wrong thing to be looking at as well, we've already collected a list of nodes we could put additional replicas on, and the counts of replicas belonging to the collection in question already on those nodes in nodeNameVsShardCount, shouldn't we be using those? And shouldn't the error be thrown if the number of available slots < numberOfNodes? The number of available slots isn't calculated correctly I don't think.

        How this interacts with rules is a mystery to me though, don't want to wade around in this without a check. The attached patch is full of nocommits but shows what I had in mind. But it's late so don't look too closely, if you two think this is on track I'll make it a real patch.

        Show
        Erick Erickson added a comment - Shalin Shekhar Mangar Noble Paul Pinging you two since you've been into Assign more recently than I have and this looks bogus. But then again it's late and I suspect I'm missing something obvious. So before I dive into this am I off base? Anyway, Assign.getNodesForNewShard doesn't make sense to me. First of all, it's only called from CREATESHARD and ADDREPLICA. It looks like copy/paste from CREATE though, and some assumptions just don't seem to work. The error message about "cannot create collection" is totally bogus since this is never called from CREATE. (thus this JIRA, which is much more serious I think than a wonky error message). Anyway, part of the problem is in the calculations around line 208: int maxCoresAllowedToCreate = maxShardsPerNode * nodeList.size(); int requestedCoresToCreate = numSlices * repFactor; int minCoresToCreate = requestedCoresToCreate; if (maxCoresAllowedToCreate < minCoresToCreate) { throw long , complex error} In these two operations, this doesn't take into account the replicas for the collection already on the nodes in nodeList. It seems to me that nodeList is the wrong thing to be looking at as well, we've already collected a list of nodes we could put additional replicas on, and the counts of replicas belonging to the collection in question already on those nodes in nodeNameVsShardCount, shouldn't we be using those? And shouldn't the error be thrown if the number of available slots < numberOfNodes? The number of available slots isn't calculated correctly I don't think. How this interacts with rules is a mystery to me though, don't want to wade around in this without a check. The attached patch is full of nocommits but shows what I had in mind. But it's late so don't look too closely, if you two think this is on track I'll make it a real patch.
        Hide
        Noble Paul added a comment -

        It seems to me that nodeList is the wrong thing to be looking at as well, we've already collected a list of nodes we could put additional replicas on

        Yes. your observation is correct. nodeNameVsShardCount has the appropriate nodes and it should be used to arrive at maxCoresAllowedToCreate

        How this interacts with rules is a mystery to me though

        Rules actually ignores the maxShardsPerNode param. It totally relies on the rules to identify the nodes

        While you are working on this please rename the class Node to something like ReplicaCount or whatever is suitable

        Show
        Noble Paul added a comment - It seems to me that nodeList is the wrong thing to be looking at as well, we've already collected a list of nodes we could put additional replicas on Yes. your observation is correct. nodeNameVsShardCount has the appropriate nodes and it should be used to arrive at maxCoresAllowedToCreate How this interacts with rules is a mystery to me though Rules actually ignores the maxShardsPerNode param. It totally relies on the rules to identify the nodes While you are working on this please rename the class Node to something like ReplicaCount or whatever is suitable
        Hide
        Erick Erickson added a comment -

        I think this patch is ready. It does several things:

        1> returns more appropriate error messages
        2> calculates whether we should be able to create a replica when adding a shard or replica correctly
        3> adds a bunch of tests.
        4> uses the same logic for allowing replicas to be added when node/nodeset is specified on ADDREPLICA and CREATESHARD

        I haven't run this through precommit or the full test suite yet, putting this version up for people to look at if you want.

        Show
        Erick Erickson added a comment - I think this patch is ready. It does several things: 1> returns more appropriate error messages 2> calculates whether we should be able to create a replica when adding a shard or replica correctly 3> adds a bunch of tests. 4> uses the same logic for allowing replicas to be added when node/nodeset is specified on ADDREPLICA and CREATESHARD I haven't run this through precommit or the full test suite yet, putting this version up for people to look at if you want.
        Hide
        Erick Erickson added a comment -

        I'll have a new patch up today/tomorrow. Fixes up a precommit problem (String.format) and closes a test resource, no real functional changes.

        Show
        Erick Erickson added a comment - I'll have a new patch up today/tomorrow. Fixes up a precommit problem (String.format) and closes a test resource, no real functional changes.
        Hide
        Erick Erickson added a comment -

        Final patch, all tests pass, precommit etc.

        Show
        Erick Erickson added a comment - Final patch, all tests pass, precommit etc.
        Hide
        ASF subversion and git services added a comment -

        Commit 1690341 from Erick Erickson in branch 'dev/trunk'
        [ https://svn.apache.org/r1690341 ]

        SOLR-7172: addreplica API fails with incorrect error msg 'cannot create collection'

        Show
        ASF subversion and git services added a comment - Commit 1690341 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1690341 ] SOLR-7172 : addreplica API fails with incorrect error msg 'cannot create collection'
        Hide
        ASF subversion and git services added a comment -

        Commit 1690348 from Erick Erickson in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1690348 ]

        SOLR-7172: addreplica API fails with incorrect error msg 'cannot create collection'

        Show
        ASF subversion and git services added a comment - Commit 1690348 from Erick Erickson in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1690348 ] SOLR-7172 : addreplica API fails with incorrect error msg 'cannot create collection'
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Erick Erickson
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development