Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None
    1. SOLR-3815_addrange.patch
      8 kB
      Yonik Seeley
    2. SOLR-3815_clusterState_immutable.patch
      16 kB
      Yonik Seeley
    3. SOLR-3815.patch
      60 kB
      Yonik Seeley

      Activity

      Hide
      Yonik Seeley added a comment -

      This issue is for

      • pulling apart the different uses of ZkNodeProps - Replica vs other generic uses
      • changing properties from Map<String,String> to Map<String,Object>
      • adding generic properties to slices
      • adding Range to slice properties
      Show
      Yonik Seeley added a comment - This issue is for pulling apart the different uses of ZkNodeProps - Replica vs other generic uses changing properties from Map<String,String> to Map<String,Object> adding generic properties to slices adding Range to slice properties
      Hide
      Yonik Seeley added a comment -

      Here's a patch that does the bare minimum to separate out Replica from ZkNodeProps. My first two attempts led to tests that didn't pass somehow, so I'm taking it slower this time.

      Show
      Yonik Seeley added a comment - Here's a patch that does the bare minimum to separate out Replica from ZkNodeProps. My first two attempts led to tests that didn't pass somehow, so I'm taking it slower this time.
      Hide
      Yonik Seeley added a comment -

      Committed:
      trunk - http://svn.apache.org/viewvc?view=revision&revision=1382621
      4x - http://svn.apache.org/viewvc?view=revision&revision=1382622

      boy do I miss the automatic linking of svn commits to JIRA issues!

      Show
      Yonik Seeley added a comment - Committed: trunk - http://svn.apache.org/viewvc?view=revision&revision=1382621 4x - http://svn.apache.org/viewvc?view=revision&revision=1382622 boy do I miss the automatic linking of svn commits to JIRA issues!
      Hide
      Yonik Seeley added a comment -

      As discussed in SOLR-3755, I've added an additional level to Slice so things like "range" and "replicationFactor" can be siblings of "replicas" (in my local copy - currently uncommitted).

      Example:

          "shard5":{"replicas":{"127.0.0.1:58050_solr_collection1":{
                "shard":"shard5",
                "leader":"true",
                "roles":null,
                "state":"active",
                "core":"collection1",
                "collection":"collection1",
                "node_name":"127.0.0.1:58050_solr",
                "base_url":"http://127.0.0.1:58050/solr"}}}},
      

      Since we're breaking back compat (just with older trunk), we should take the opportunity to make other changes as well.
      One candidate is to make the same change for Collections - adding an additional level so that other collection stats/properties can live alongside "shards" that contain the shards for the collection.

      Current:

        "collection1":{
          "shard3":{"replicas":{"127.0.0.1:58046_solr_collection1":{
      

      Proposal:

        "collection1":{
          "shards" : {
            "shard3":{"replicas":{"127.0.0.1:58046_solr_collection1":{
      

      Thoughts?
      Any other candidates for name changes while we are breaking compat anyway?

      Show
      Yonik Seeley added a comment - As discussed in SOLR-3755 , I've added an additional level to Slice so things like "range" and "replicationFactor" can be siblings of "replicas" (in my local copy - currently uncommitted). Example: "shard5" :{ "replicas" :{ "127.0.0.1:58050_solr_collection1" :{ "shard" : "shard5" , "leader" : " true " , "roles" : null , "state" : "active" , "core" : "collection1" , "collection" : "collection1" , "node_name" : "127.0.0.1:58050_solr" , "base_url" : "http: //127.0.0.1:58050/solr" }}}}, Since we're breaking back compat (just with older trunk), we should take the opportunity to make other changes as well. One candidate is to make the same change for Collections - adding an additional level so that other collection stats/properties can live alongside "shards" that contain the shards for the collection. Current: "collection1" :{ "shard3" :{ "replicas" :{ "127.0.0.1:58046_solr_collection1" :{ Proposal: "collection1" :{ "shards" : { "shard3" :{ "replicas" :{ "127.0.0.1:58046_solr_collection1" :{ Thoughts? Any other candidates for name changes while we are breaking compat anyway?
      Hide
      Yonik Seeley added a comment -

      Here's a start on adding ranges to shard properties. Seems to work at first and then gets lost on an update currently.

      Example:

      {"collection1":{
          "shard1":{"replicas":{"Rogue:8983_solr_collection1":{
                "shard":"shard1",
                "leader":"true",
                "roles":null,
                "state":"active",
                "core":"collection1",
                "collection":"collection1",
                "node_name":"Rogue:8983_solr",
                "base_url":"http://Rogue:8983/solr"}}},
          "shard2":{
            "range":"0-7fffffff",
            "replicas":{}}}}
      
      Show
      Yonik Seeley added a comment - Here's a start on adding ranges to shard properties. Seems to work at first and then gets lost on an update currently. Example: { "collection1" :{ "shard1" :{ "replicas" :{ "Rogue:8983_solr_collection1" :{ "shard" : "shard1" , "leader" : " true " , "roles" : null , "state" : "active" , "core" : "collection1" , "collection" : "collection1" , "node_name" : "Rogue:8983_solr" , "base_url" : "http: //Rogue:8983/solr" }}}, "shard2" :{ "range" : "0-7fffffff" , "replicas" :{}}}}
      Hide
      Yonik Seeley added a comment - - edited

      Folks, while working to add the "replicas" level to shards (to make room for other properties), I noticed that the Overseer.updateSlice() method changed the existing ClusterState (which is advertised as being immutable). I re-wrote the method to be much shorter, and immutable with respect to the existing ClusterState, and started getting a test failure.

      I eventually tried just adding back the part of the code that erroneously modified the existing ClusterState, and the test passed again (see the nocommit block in Overseer).

      Any idea what's going on?

      edit: the test that failed was LeaderElectionIntegrationTest. Not sure if it caused other failures.

      edit: in Overseer.run() we have "ClusterState clusterState = reader.getClusterState();" and that is the state that is accidentally being modified (that accidentally makes things work). I assume the reader is supposed to update it's state via zookeeper - which means there is perhaps something wrong with reader.updateClusterState(true)?

      Show
      Yonik Seeley added a comment - - edited Folks, while working to add the "replicas" level to shards (to make room for other properties), I noticed that the Overseer.updateSlice() method changed the existing ClusterState (which is advertised as being immutable). I re-wrote the method to be much shorter, and immutable with respect to the existing ClusterState, and started getting a test failure. I eventually tried just adding back the part of the code that erroneously modified the existing ClusterState, and the test passed again (see the nocommit block in Overseer). Any idea what's going on? edit: the test that failed was LeaderElectionIntegrationTest. Not sure if it caused other failures. edit: in Overseer.run() we have "ClusterState clusterState = reader.getClusterState();" and that is the state that is accidentally being modified (that accidentally makes things work). I assume the reader is supposed to update it's state via zookeeper - which means there is perhaps something wrong with reader.updateClusterState(true)?
      Hide
      Sami Siren added a comment -

      Yonik, I was trying to reproduce what you see and applied SOLR-3815_clusterState_immutable.patch. do you mean that with that patch (which modifies the existing state inside the nocommits block) the LeaderElectionIntegrationTest should pass? It fails for me sporadically around 50% of time.

      wrt. editing the immutable state. I think the problem may be in the shared ZKStateReader which is used by the overseer and rest of the system. Shouldn't the overseer create it's own reader so that it cannot modify the state (which should not be happening in the first place, bug, bug!) that other parts running on that same instance see and use.

      Show
      Sami Siren added a comment - Yonik, I was trying to reproduce what you see and applied SOLR-3815 _clusterState_immutable.patch. do you mean that with that patch (which modifies the existing state inside the nocommits block) the LeaderElectionIntegrationTest should pass? It fails for me sporadically around 50% of time. wrt. editing the immutable state. I think the problem may be in the shared ZKStateReader which is used by the overseer and rest of the system. Shouldn't the overseer create it's own reader so that it cannot modify the state (which should not be happening in the first place, bug, bug!) that other parts running on that same instance see and use.
      Hide
      Sami Siren added a comment -

      It appears that if I create a new instance of the ZkStateReader in Overseer constructor for overseer to use the test starts to fail 100% of time which seems to suggest the side-effect of editing the existing state is gone.

      I did not try to look at the actual failure yet since i am not sure if I am looking at the right thing...

      Show
      Sami Siren added a comment - It appears that if I create a new instance of the ZkStateReader in Overseer constructor for overseer to use the test starts to fail 100% of time which seems to suggest the side-effect of editing the existing state is gone. I did not try to look at the actual failure yet since i am not sure if I am looking at the right thing...
      Hide
      Mark Miller added a comment -

      LeaderElectionIntegrationTest

      When I commit my latest work today, one thing I did was make this test more stable. Once I commit, it should be easier to determine if this was involved in your change or not.

      Show
      Mark Miller added a comment - LeaderElectionIntegrationTest When I commit my latest work today, one thing I did was make this test more stable. Once I commit, it should be easier to determine if this was involved in your change or not.
      Hide
      Yonik Seeley added a comment -

      do you mean that with that patch (which modifies the existing state inside the nocommits block) the LeaderElectionIntegrationTest should pass? It fails for me sporadically around 50% of time.

      OK, perhaps it's because I modify less of the original state that the original code does (that's commented out).
      Just to clarify, I don't want to modify the original state - it was just the only way I could get the test to pass sometimes (because I noticed the original code modified the state).

      I think the problem may be in the shared ZKStateReader which is used by the overseer and rest of the system.

      Ah, that would explain it - I hadn't realized the single instance was shared. The state not really being immutable could definitely cause issues then!

      Show
      Yonik Seeley added a comment - do you mean that with that patch (which modifies the existing state inside the nocommits block) the LeaderElectionIntegrationTest should pass? It fails for me sporadically around 50% of time. OK, perhaps it's because I modify less of the original state that the original code does (that's commented out). Just to clarify, I don't want to modify the original state - it was just the only way I could get the test to pass sometimes (because I noticed the original code modified the state). I think the problem may be in the shared ZKStateReader which is used by the overseer and rest of the system. Ah, that would explain it - I hadn't realized the single instance was shared. The state not really being immutable could definitely cause issues then!
      Hide
      Yonik Seeley added a comment -

      I just marked this issue a blocker for 4.0
      We want to get the clusterstate.json format finalized for 4.0 so we can implement splitting, etc, at any point in the 4x line (hopefully sooner rather than later).

      Show
      Yonik Seeley added a comment - I just marked this issue a blocker for 4.0 We want to get the clusterstate.json format finalized for 4.0 so we can implement splitting, etc, at any point in the 4x line (hopefully sooner rather than later).
      Hide
      Mark Miller added a comment -

      Ah, that would explain it - I hadn't realized the single instance was shared. The state not really being immutable could definitely cause issues then!

      If I remember right, I'm the one that prompted Sami to share this - with the idea that overseer didn't have to get the clusterstate.json twice on every change. Didn't realize it would introduce an issue though.

      Show
      Mark Miller added a comment - Ah, that would explain it - I hadn't realized the single instance was shared. The state not really being immutable could definitely cause issues then! If I remember right, I'm the one that prompted Sami to share this - with the idea that overseer didn't have to get the clusterstate.json twice on every change. Didn't realize it would introduce an issue though.
      Hide
      Yonik Seeley added a comment - - edited

      with the idea that overseer didn't have to get the clusterstate.json twice on every change.

      That would be fine if the state were actually immutable. But this test must have been relying on getting the info through the back door (I hope it's mainly just a test issue at least).

      Show
      Yonik Seeley added a comment - - edited with the idea that overseer didn't have to get the clusterstate.json twice on every change. That would be fine if the state were actually immutable. But this test must have been relying on getting the info through the back door (I hope it's mainly just a test issue at least).
      Hide
      Yonik Seeley added a comment -

      When I commit my latest work today, one thing I did was make this test more stable.

      Yep, looks like the test was fixed!

      I've committed the changes to add the "replicas" level for each slice to both trunk and 4x:
      http://svn.apache.org/viewvc?rev=1386858&view=rev

      Show
      Yonik Seeley added a comment - When I commit my latest work today, one thing I did was make this test more stable. Yep, looks like the test was fixed! I've committed the changes to add the "replicas" level for each slice to both trunk and 4x: http://svn.apache.org/viewvc?rev=1386858&view=rev
      Hide
      Yonik Seeley added a comment -

      Looks like Overseer.setShardLeader also modifies previously created (immutable) state. I'll fix.

      Show
      Yonik Seeley added a comment - Looks like Overseer.setShardLeader also modifies previously created (immutable) state. I'll fix.
      Hide
      Robert Muir added a comment -

      I think you intended to backport r1386858 to 4.x?

      Show
      Robert Muir added a comment - I think you intended to backport r1386858 to 4.x?
      Hide
      Yonik Seeley added a comment -

      I think you intended to backport r1386858 to 4.x?

      Yep - I hadn't noticed my commit failed. Committed in r1387245

      Show
      Yonik Seeley added a comment - I think you intended to backport r1386858 to 4.x? Yep - I hadn't noticed my commit failed. Committed in r1387245
      Hide
      Yonik Seeley added a comment -
      Show
      Yonik Seeley added a comment - Committed Overseer.setShardLeader fix. trunk: http://svn.apache.org/viewvc?rev=1387354&view=rev 4x: http://svn.apache.org/viewvc?rev=1387355&view=rev
      Hide
      Yonik Seeley added a comment -
      Show
      Yonik Seeley added a comment - Committed fix to preserve shard properties: trunk: http://svn.apache.org/viewvc?rev=1387747&view=rev 4x: http://svn.apache.org/viewvc?rev=1387749&view=rev
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Yonik Seeley
      http://svn.apache.org/viewvc?view=revision&revision=1387749

      SOLR-3815: preserve shard properties

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1387749 SOLR-3815 : preserve shard properties
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Yonik Seeley
      http://svn.apache.org/viewvc?view=revision&revision=1387355

      SOLR-3815: fix overseer.setShardLeader to not modify existing state

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1387355 SOLR-3815 : fix overseer.setShardLeader to not modify existing state
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Yonik Seeley
      http://svn.apache.org/viewvc?view=revision&revision=1387245

      SOLR-3815: change clusterstate structure to add properties to slices

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1387245 SOLR-3815 : change clusterstate structure to add properties to slices
      Hide
      Commit Tag Bot added a comment -

      [branch_4x commit] Yonik Seeley
      http://svn.apache.org/viewvc?view=revision&revision=1382622

      SOLR-3815: separate ZkNodeProps from Replica, change properties from String values to Object values

      Show
      Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1382622 SOLR-3815 : separate ZkNodeProps from Replica, change properties from String values to Object values
      Hide
      Uwe Schindler added a comment -

      Closed after release.

      Show
      Uwe Schindler added a comment - Closed after release.

        People

        • Assignee:
          Unassigned
          Reporter:
          Yonik Seeley
        • Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development