Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 5.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      We can currently easily add replicas to handle increases in query volume, but we should also add a way to add additional shards dynamically by splitting existing shards.

      1. SOLR-3755.patch
        84 kB
        Shalin Shekhar Mangar
      2. SOLR-3755.patch
        83 kB
        Shalin Shekhar Mangar
      3. SOLR-3755.patch
        81 kB
        Shalin Shekhar Mangar
      4. SOLR-3755.patch
        64 kB
        Shalin Shekhar Mangar
      5. SOLR-3755.patch
        60 kB
        Shalin Shekhar Mangar
      6. SOLR-3755.patch
        65 kB
        Shalin Shekhar Mangar
      7. SOLR-3755.patch
        47 kB
        Mark Miller
      8. SOLR-3755.patch
        52 kB
        Mark Miller
      9. SOLR-3755.patch
        19 kB
        Yonik Seeley
      10. SOLR-3755.patch
        14 kB
        Yonik Seeley
      11. SOLR-3755-combined.patch
        36 kB
        Shalin Shekhar Mangar
      12. SOLR-3755-combinedWithReplication.patch
        41 kB
        Anshum Gupta
      13. SOLR-3755-CoreAdmin.patch
        1 kB
        Anshum Gupta
      14. SOLR-3755-testSplitter.patch
        8 kB
        Shalin Shekhar Mangar
      15. SOLR-3755-testSplitter.patch
        5 kB
        Shalin Shekhar Mangar

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.
          Hide
          Shalin Shekhar Mangar added a comment -

          Yeah, that'll work. We have an issue open to track this: SOLR-4745

          Show
          Shalin Shekhar Mangar added a comment - Yeah, that'll work. We have an issue open to track this: SOLR-4745
          Hide
          Mark Miller added a comment -

          I don't see the 'easy' fix unfortunately.

          Okay, I think I found it - doing this stuff in the bottom of the SolrCore constructor rather than preRegister seems to work so far.

          Show
          Mark Miller added a comment - I don't see the 'easy' fix unfortunately. Okay, I think I found it - doing this stuff in the bottom of the SolrCore constructor rather than preRegister seems to work so far.
          Hide
          Mark Miller added a comment -

          I'll revert the change to the preRegister method signature and find another way.

          I'm trying to look at this now. I'm not sure how to go about solving in an 'easy' way. Currently, you have to start buffering those updates before publishing, but I want it to so that you publish as DOWN before creating the SolrCore - but you need the SolrCore to start buffering.

          I don't see the 'easy' fix unfortunately.

          Show
          Mark Miller added a comment - I'll revert the change to the preRegister method signature and find another way. I'm trying to look at this now. I'm not sure how to go about solving in an 'easy' way. Currently, you have to start buffering those updates before publishing, but I want it to so that you publish as DOWN before creating the SolrCore - but you need the SolrCore to start buffering. I don't see the 'easy' fix unfortunately.
          Hide
          Shalin Shekhar Mangar added a comment -

          This feature will be released with Solr 4.3

          Show
          Shalin Shekhar Mangar added a comment - This feature will be released with Solr 4.3
          Hide
          Shalin Shekhar Mangar added a comment -

          I haven't seen the test failure due to extra document after increasing read timeout values in the test. Now that 4.3 is about to release with this feature, I'm going to mark this issue as resolved.

          Show
          Shalin Shekhar Mangar added a comment - I haven't seen the test failure due to extra document after increasing read timeout values in the test. Now that 4.3 is about to release with this feature, I'm going to mark this issue as resolved.
          Hide
          Shalin Shekhar Mangar added a comment -

          Is this true for all the exceptions or just the one that follows this line? I wasn't able to reproduce this on my system running Java7.

          The error with the failing add doc happens with Java6 – haven't seen it with any other version. I've seen the version conflict exception on java7 and java8.

          Also, are these consistent failures?

          Yes but only on jenkins! I've had ec2 boxes running these tests all night and I haven't seen a failure in over 500 runs. These failures are very environment and timing dependent.

          Show
          Shalin Shekhar Mangar added a comment - Is this true for all the exceptions or just the one that follows this line? I wasn't able to reproduce this on my system running Java7. The error with the failing add doc happens with Java6 – haven't seen it with any other version. I've seen the version conflict exception on java7 and java8. Also, are these consistent failures? Yes but only on jenkins! I've had ec2 boxes running these tests all night and I haven't seen a failure in over 500 runs. These failures are very environment and timing dependent.
          Hide
          Anshum Gupta added a comment -

          This happens mostly with Lucene-Solr-Tests-4.x-Java6 builds.

          Is this true for all the exceptions or just the one that follows this line? I wasn't able to reproduce this on my system running Java7.
          Also, are these consistent failures?

          Show
          Anshum Gupta added a comment - This happens mostly with Lucene-Solr-Tests-4.x-Java6 builds. Is this true for all the exceptions or just the one that follows this line? I wasn't able to reproduce this on my system running Java7. Also, are these consistent failures?
          Hide
          Mark Miller added a comment -

          I'll revert the change to the preRegister method signature and find another way.

          I'm happy to help on this - it might be easier to just create a new issue rather than reverting, and work on getting it nicer from there, up to you though.

          Show
          Mark Miller added a comment - I'll revert the change to the preRegister method signature and find another way. I'm happy to help on this - it might be easier to just create a new issue rather than reverting, and work on getting it nicer from there, up to you though.
          Hide
          Shalin Shekhar Mangar added a comment -

          Anshum suggested over chat that we should think about combining ShardSplitTest and ChaosMonkeyShardSplit tests into one to avoid code duplication. I'll try to see if we can do that.

          I've changed ChaosMonkeyShardSplitTest to extend ShardSplitTest so that we can share most of the code. The ChaosMonkey test is not completely correct and I intend to improve it.

          The original change around this made preRegister start taking a core rather than a core descriptor. I'd like to work that out so it doesn't need to be the case.

          I'll revert the change to the preRegister method signature and find another way.

          I've found two kinds of test failures of (ChaosMonkey)ShardSplitTest.

          The first is because of the following sequence of events:

          1. A doc addition fails (because of the kill leader jetty command), client throws an exception and therefore the docCount variable is not incremented inside the index thread.
          2. However, the doc addition is recorded in the update logs (of the proxy node?) and replayed on the new leader so in reality, the doc does get added to the shard
          3. Split happens and we assert on docCounts being equal in the server which fails because the server has the document that we have not counted.

          This happens mostly with Lucene-Solr-Tests-4.x-Java6 builds. The bug is in the tests and not in the split code. Following is the stack trace:

          [junit4:junit4]   1> ERROR - 2013-04-14 14:24:27.697; org.apache.solr.cloud.ChaosMonkeyShardSplitTest$1; Exception while adding doc
          [junit4:junit4]   1> org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://127.0.0.1:34203/h/y/collection1, http://127.0.0.1:34304/h/y/collection1, http://127.0.0.1:34311/h/y/collection1, http://127.0.0.1:34270/h/y/collection1]
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:333)
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:306)
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
          [junit4:junit4]   1> 	at org.apache.solr.cloud.AbstractFullDistribZkTestBase.indexDoc(AbstractFullDistribZkTestBase.java:561)
          [junit4:junit4]   1> 	at org.apache.solr.cloud.ChaosMonkeyShardSplitTest.indexr(ChaosMonkeyShardSplitTest.java:434)
          [junit4:junit4]   1> 	at org.apache.solr.cloud.ChaosMonkeyShardSplitTest$1.run(ChaosMonkeyShardSplitTest.java:158)
          [junit4:junit4]   1> Caused by: org.apache.solr.common.SolrException: Server at http://127.0.0.1:34311/h/y/collection1 returned non ok status:503, message:Service Unavailable
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:264)
          [junit4:junit4]   1> 	... 5 more
          

          Perhaps we should check the exception message and continue to count such a document?

          The second kind of test failures are where a document add fails due to version conflict. This exception is always seen just after the "updateshardstate" is called to switch the shard states. Following is the relevant log:

          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state invoked for collection: collection1
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1 to inactive
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1_0 to active
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1_1 to active
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.873; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[169 (1432319507166134272)]} 0 2
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.884; org.apache.solr.update.processor.LogUpdateProcessor; [collection1_shard1_1_replica1] webapp= path=/update params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2} {} 0 1
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.885; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2} {add=[169 (1432319507173474304)]} 0 2
          [junit4:junit4]   1> ERROR - 2013-04-14 19:05:26.885; org.apache.solr.common.SolrException; shard update error StdNode: http://127.0.0.1:41028/collection1_shard1_1_replica1/:org.apache.solr.common.SolrException: version conflict for 169 expected=1432319507173474304 actual=-1
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404)
          [junit4:junit4]   1> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
          [junit4:junit4]   1> 	at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
          [junit4:junit4]   1> 	at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
          [junit4:junit4]   1> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          [junit4:junit4]   1> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
          [junit4:junit4]   1> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          [junit4:junit4]   1> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          [junit4:junit4]   1> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
          [junit4:junit4]   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
          [junit4:junit4]   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          [junit4:junit4]   1> 	at java.lang.Thread.run(Thread.java:679)
          [junit4:junit4]   1> 
          [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.886; org.apache.solr.update.processor.DistributedUpdateProcessor; try and ask http://127.0.0.1:41028 to recover
          

          I'm not sure yet why a version conflict will happen and why it follows an "updateshardstate" command.

          Show
          Shalin Shekhar Mangar added a comment - Anshum suggested over chat that we should think about combining ShardSplitTest and ChaosMonkeyShardSplit tests into one to avoid code duplication. I'll try to see if we can do that. I've changed ChaosMonkeyShardSplitTest to extend ShardSplitTest so that we can share most of the code. The ChaosMonkey test is not completely correct and I intend to improve it. The original change around this made preRegister start taking a core rather than a core descriptor. I'd like to work that out so it doesn't need to be the case. I'll revert the change to the preRegister method signature and find another way. I've found two kinds of test failures of (ChaosMonkey)ShardSplitTest. The first is because of the following sequence of events: A doc addition fails (because of the kill leader jetty command), client throws an exception and therefore the docCount variable is not incremented inside the index thread. However, the doc addition is recorded in the update logs (of the proxy node?) and replayed on the new leader so in reality, the doc does get added to the shard Split happens and we assert on docCounts being equal in the server which fails because the server has the document that we have not counted. This happens mostly with Lucene-Solr-Tests-4.x-Java6 builds. The bug is in the tests and not in the split code. Following is the stack trace: [junit4:junit4] 1> ERROR - 2013-04-14 14:24:27.697; org.apache.solr.cloud.ChaosMonkeyShardSplitTest$1; Exception while adding doc [junit4:junit4] 1> org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http: //127.0.0.1:34203/h/y/collection1, http://127.0.0.1:34304/h/y/collection1, http://127.0.0.1:34311/h/y/collection1, http://127.0.0.1:34270/h/y/collection1] [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:333) [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:306) [junit4:junit4] 1> at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) [junit4:junit4] 1> at org.apache.solr.cloud.AbstractFullDistribZkTestBase.indexDoc(AbstractFullDistribZkTestBase.java:561) [junit4:junit4] 1> at org.apache.solr.cloud.ChaosMonkeyShardSplitTest.indexr(ChaosMonkeyShardSplitTest.java:434) [junit4:junit4] 1> at org.apache.solr.cloud.ChaosMonkeyShardSplitTest$1.run(ChaosMonkeyShardSplitTest.java:158) [junit4:junit4] 1> Caused by: org.apache.solr.common.SolrException: Server at http: //127.0.0.1:34311/h/y/collection1 returned non ok status:503, message:Service Unavailable [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:264) [junit4:junit4] 1> ... 5 more Perhaps we should check the exception message and continue to count such a document? The second kind of test failures are where a document add fails due to version conflict. This exception is always seen just after the "updateshardstate" is called to switch the shard states. Following is the relevant log: [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state invoked for collection: collection1 [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1 to inactive [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1_0 to active [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.861; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1_1 to active [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.873; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[169 (1432319507166134272)]} 0 2 [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.877; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.884; org.apache.solr.update.processor.LogUpdateProcessor; [collection1_shard1_1_replica1] webapp= path=/update params={distrib.from=http: //127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2} {} 0 1 [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.885; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= path=/update params={distrib.from=http: //127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2} {add=[169 (1432319507173474304)]} 0 2 [junit4:junit4] 1> ERROR - 2013-04-14 19:05:26.885; org.apache.solr.common.SolrException; shard update error StdNode: http: //127.0.0.1:41028/collection1_shard1_1_replica1/:org.apache.solr.common.SolrException: version conflict for 169 expected=1432319507173474304 actual=-1 [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404) [junit4:junit4] 1> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) [junit4:junit4] 1> at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) [junit4:junit4] 1> at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) [junit4:junit4] 1> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [junit4:junit4] 1> at java.util.concurrent.FutureTask.run(FutureTask.java:166) [junit4:junit4] 1> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [junit4:junit4] 1> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [junit4:junit4] 1> at java.util.concurrent.FutureTask.run(FutureTask.java:166) [junit4:junit4] 1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) [junit4:junit4] 1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [junit4:junit4] 1> at java.lang. Thread .run( Thread .java:679) [junit4:junit4] 1> [junit4:junit4] 1> INFO - 2013-04-14 19:05:26.886; org.apache.solr.update.processor.DistributedUpdateProcessor; try and ask http: //127.0.0.1:41028 to recover I'm not sure yet why a version conflict will happen and why it follows an "updateshardstate" command.
          Hide
          Mark Miller added a comment -

          Set update log to buffering mode before it is published (fixes bug with extra doc count on sub-shard)

          Regarding those changes - I'd really like to find another way to do that.

          The original change around this made preRegister start taking a core rather than a core descriptor. I'd like to work that out so it doesn't need to be the case. That is where the core will find out some of it's properties (shard id, core node name, perhaps more in the future). It would be nice if the core init code had access to this information - so it would be nice if we could call preRegister (or some refactored version) before actually creating the SolrCore.

          Show
          Mark Miller added a comment - Set update log to buffering mode before it is published (fixes bug with extra doc count on sub-shard) Regarding those changes - I'd really like to find another way to do that. The original change around this made preRegister start taking a core rather than a core descriptor. I'd like to work that out so it doesn't need to be the case. That is where the core will find out some of it's properties (shard id, core node name, perhaps more in the future). It would be nice if the core init code had access to this information - so it would be nice if we could call preRegister (or some refactored version) before actually creating the SolrCore.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed three changes:

          1. Set update log to buffering mode before it is published (fixes bug with extra doc count on sub-shard)
          2. Use deleteIndex=true while unloading sub-shard cores (if a sub-shard in construction state already exists at the start of the splitshard operation)
          3. Made ChaosMonkeyShardSplitTest consistent with ShardSplitTest – Use correct router and replica count, assert sub-shards are active, parent shards are inactive etc

          Anshum suggested over chat that we should think about combining ShardSplitTest and ChaosMonkeyShardSplit tests into one to avoid code duplication. I'll try to see if we can do that.

          Show
          Shalin Shekhar Mangar added a comment - Committed three changes: Set update log to buffering mode before it is published (fixes bug with extra doc count on sub-shard) Use deleteIndex=true while unloading sub-shard cores (if a sub-shard in construction state already exists at the start of the splitshard operation) Made ChaosMonkeyShardSplitTest consistent with ShardSplitTest – Use correct router and replica count, assert sub-shards are active, parent shards are inactive etc Anshum suggested over chat that we should think about combining ShardSplitTest and ChaosMonkeyShardSplit tests into one to avoid code duplication. I'll try to see if we can do that.
          Hide
          Shalin Shekhar Mangar added a comment -

          Adding patch that I've committed to trunk and branch_4x.

          Changes:
          1. Changed param name from "name" to "collection" for splitshard api
          2. Added a comment warning not to use shardState and shardRange from CloudDescriptor

          Show
          Shalin Shekhar Mangar added a comment - Adding patch that I've committed to trunk and branch_4x. Changes: 1. Changed param name from "name" to "collection" for splitshard api 2. Added a comment warning not to use shardState and shardRange from CloudDescriptor
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk.

          Changes:

          1. Router and range for the current core is used during split
          2. Change sub-shard replication to check only sub shard range instead of state
          3. Fixed replica allocation code such that replicas are not created on the same node always
          4. Splitting a sub-shard works
          5. Slice state and range kept in CloudDescriptor is used one-time only. DUPF and other places rely on state/range read from ZK.
          6. Removed multiple debug statements, sleeps and nocommits
          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk. Changes: Router and range for the current core is used during split Change sub-shard replication to check only sub shard range instead of state Fixed replica allocation code such that replicas are not created on the same node always Splitting a sub-shard works Slice state and range kept in CloudDescriptor is used one-time only. DUPF and other places rely on state/range read from ZK. Removed multiple debug statements, sleeps and nocommits
          Hide
          Anshum Gupta added a comment -

          All of the above mentioned issues (and more) are now fixed.

          Show
          Anshum Gupta added a comment - All of the above mentioned issues (and more) are now fixed.
          Hide
          Shalin Shekhar Mangar added a comment -

          The sub shard cores are created while the sub shard is in construction state therefore their cloud descriptor keeps "construction" as the shard state. If the sub shard leader goes down after the shard state has been changed to "active", it sets the shard state to "construction" once again while publishing itself as "down".

          I've fixed it in the git branch. Although I don't like the fix very much. In the git branch, I'm using the shardState and shardRange fields in CloudDescriptor for a one-time usage. They are set to null once the new sub shard core is registered (and the new sub shard is created in zk).

          Maybe shardState and shardRange should be a core property instead?

          Show
          Shalin Shekhar Mangar added a comment - The sub shard cores are created while the sub shard is in construction state therefore their cloud descriptor keeps "construction" as the shard state. If the sub shard leader goes down after the shard state has been changed to "active", it sets the shard state to "construction" once again while publishing itself as "down". I've fixed it in the git branch. Although I don't like the fix very much. In the git branch, I'm using the shardState and shardRange fields in CloudDescriptor for a one-time usage. They are set to null once the new sub shard core is registered (and the new sub shard is created in zk). Maybe shardState and shardRange should be a core property instead?
          Hide
          Anshum Gupta added a comment -

          I've run into a few more issues while trying to improve the error handling/reporting.

          1. Splitting an existing sub-shard gets stuck up. The new sub-sub shards stay in construction state forever.
          2. The replicas are generally always created on the same node. (Debugging/fixing that)

          Show
          Anshum Gupta added a comment - I've run into a few more issues while trying to improve the error handling/reporting. 1. Splitting an existing sub-shard gets stuck up. The new sub-sub shards stay in construction state forever. 2. The replicas are generally always created on the same node. (Debugging/fixing that)
          Hide
          Shalin Shekhar Mangar added a comment -

          I ran into another bug. Adding mutable state in cloud descriptor (like shard state and range) is a bad idea.

          The sub shard cores are created while the sub shard is in construction state therefore their cloud descriptor keeps "construction" as the shard state. If the sub shard leader goes down after the shard state has been changed to "active", it sets the shard state to "construction" once again while publishing itself as "down".

          Show
          Shalin Shekhar Mangar added a comment - I ran into another bug. Adding mutable state in cloud descriptor (like shard state and range) is a bad idea. The sub shard cores are created while the sub shard is in construction state therefore their cloud descriptor keeps "construction" as the shard state. If the sub shard leader goes down after the shard state has been changed to "active", it sets the shard state to "construction" once again while publishing itself as "down".
          Hide
          Shalin Shekhar Mangar added a comment -

          I'd like to commit the patch to 4x and trunk soon. We can then work on improving the features and tests via the regular route. If there are no objections, I'll commit it tomorrow.

          It's very common for these types of tests to be sensitive to the exact env (hardware, OS, etc). A lot of times it's some timing issue.

          Yeah, I'm still trying to reproduce the issue. I'll try to find a solution before I commit.

          Show
          Shalin Shekhar Mangar added a comment - I'd like to commit the patch to 4x and trunk soon. We can then work on improving the features and tests via the regular route. If there are no objections, I'll commit it tomorrow. It's very common for these types of tests to be sensitive to the exact env (hardware, OS, etc). A lot of times it's some timing issue. Yeah, I'm still trying to reproduce the issue. I'll try to find a solution before I commit.
          Hide
          Anshum Gupta added a comment -

          Mark, you're right, it seems like a timing issue. I don't think even Shalin has been able to to recreate it too often under the same environment. Not even with the same seed.

          Show
          Anshum Gupta added a comment - Mark, you're right, it seems like a timing issue. I don't think even Shalin has been able to to recreate it too often under the same environment. Not even with the same seed.
          Hide
          Anshum Gupta added a comment -

          You'd need to do a git merge and then compare it with the current branch.
          git fetch upstream
          git merge upstream/trunk
          git diff --no-prefix upstream/trunk

          This should show the diff. For now, I've just merged the current state of the trunk with this branch. Getting the diff now should be straight forward.

          Show
          Anshum Gupta added a comment - You'd need to do a git merge and then compare it with the current branch. git fetch upstream git merge upstream/trunk git diff --no-prefix upstream/trunk This should show the diff. For now, I've just merged the current state of the trunk with this branch. Getting the diff now should be straight forward.
          Hide
          Mark Miller added a comment -

          Was trying to look into it but strangely, I haven't run into it over 15 consecutive runs.

          It's very common for these types of tests to be sensitive to the exact env (hardware, OS, etc). A lot of times it's some timing issue.

          Show
          Mark Miller added a comment - Was trying to look into it but strangely, I haven't run into it over 15 consecutive runs. It's very common for these types of tests to be sensitive to the exact env (hardware, OS, etc). A lot of times it's some timing issue.
          Hide
          Mark Miller added a comment -

          AFAIK it's somewhat annoying - usually it involves doing a squash commit on a tmp branch and diffing with that if you want it nicely in one file/chunk. Otherwise git format-patch can go back n commits and make a diff for each one and you'd have to stitch them together.

          Show
          Mark Miller added a comment - AFAIK it's somewhat annoying - usually it involves doing a squash commit on a tmp branch and diffing with that if you want it nicely in one file/chunk. Otherwise git format-patch can go back n commits and make a diff for each one and you'd have to stitch them together.
          Hide
          Yonik Seeley added a comment -

          Nice that this is on a git branch - no stale patches, and you can see the full history!

          Does anyone know an easy way to generate a diff?
          I did the following:

          git clone https://github.com/shalinmangar/lucene-solr.git lusolr_shardsplitting
          cd lusolr_shardsplitting
          git remote add upstream git://git.apache.org/lucene-solr.git
          git diff remotes/upstream/trunk remotes/origin/trunk
          

          But this does a diff with the current state of the trunk vs the branch. Any tips from the git wizards out there?

          Show
          Yonik Seeley added a comment - Nice that this is on a git branch - no stale patches, and you can see the full history! Does anyone know an easy way to generate a diff? I did the following: git clone https: //github.com/shalinmangar/lucene-solr.git lusolr_shardsplitting cd lusolr_shardsplitting git remote add upstream git: //git.apache.org/lucene-solr.git git diff remotes/upstream/trunk remotes/origin/trunk But this does a diff with the current state of the trunk vs the branch. Any tips from the git wizards out there?
          Hide
          Anshum Gupta added a comment -

          Was trying to look into it but strangely, I haven't run into it over 15 consecutive runs.

          Show
          Anshum Gupta added a comment - Was trying to look into it but strangely, I haven't run into it over 15 consecutive runs.
          Hide
          Shalin Shekhar Mangar added a comment -

          Okay, the test still fails sometimes. I'm looking into it.

          Show
          Shalin Shekhar Mangar added a comment - Okay, the test still fails sometimes. I'm looking into it.
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. Make splitshard retryable by unloading cores part of a sub-shard and starting from scratch again. If a sub-shard is already in active state, splitshard will fail.
          2. Use PRERECOVERY core admin command to make the parent shard leader wait for sub shard cores to be active
          3. Similar to point 2 above, use the PRERECOVERY command to wait for sub shard replicas to be available before switching the shard states
          4. The mismatch between the expected and actual number of documents was because the SolrIndexSplitter does not take the type of the unique key field into account while calculating the hash. I changed it to use indexedToReadable to make it compute the correct hash. This needs to be reviewed for performance implications.
          5. ShardSplitTest passes.

          I'll be working to add more tests in the coming days.

          Show
          Shalin Shekhar Mangar added a comment - Changes: Make splitshard retryable by unloading cores part of a sub-shard and starting from scratch again. If a sub-shard is already in active state, splitshard will fail. Use PRERECOVERY core admin command to make the parent shard leader wait for sub shard cores to be active Similar to point 2 above, use the PRERECOVERY command to wait for sub shard replicas to be available before switching the shard states The mismatch between the expected and actual number of documents was because the SolrIndexSplitter does not take the type of the unique key field into account while calculating the hash. I changed it to use indexedToReadable to make it compute the correct hash. This needs to be reviewed for performance implications. ShardSplitTest passes. I'll be working to add more tests in the coming days.
          Hide
          Mark Miller added a comment -

          Hope to take a look at what you guys have been up to again soon.

          Show
          Mark Miller added a comment - Hope to take a look at what you guys have been up to again soon.
          Hide
          Anshum Gupta added a comment -

          There'are more changes on the branch, including a ChaosMonkey test for the feature. Any feedback on the design/strategy would be good.

          Also, I'm working on adding some more documentation on the general strategy somewhere in the code/package and improving the javadoc for the same as well.

          Show
          Anshum Gupta added a comment - There'are more changes on the branch, including a ChaosMonkey test for the feature. Any feedback on the design/strategy would be good. Also, I'm working on adding some more documentation on the general strategy somewhere in the code/package and improving the javadoc for the same as well.
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. Fixed off-by-one mistake in replica creation
          2. Commit before split
          3. Better sub-shard node addition logic using shard ranges. Defensive checks in DUPF are modified to allow sub-shard replication. A new param distrib.from.shard is added to forwarded requests if node list contains sub-shard leaders.
          4. Modified ShardSplitTest to index docs constantly (to check sub-shard replication).

          The test fails currently because the sub-shards now have more docs than they should have. I'm investigating this.

          Patch is created on top of trunk r1459441 via git.

          Show
          Shalin Shekhar Mangar added a comment - Changes: Fixed off-by-one mistake in replica creation Commit before split Better sub-shard node addition logic using shard ranges. Defensive checks in DUPF are modified to allow sub-shard replication. A new param distrib.from.shard is added to forwarded requests if node list contains sub-shard leaders. Modified ShardSplitTest to index docs constantly (to check sub-shard replication). The test fails currently because the sub-shards now have more docs than they should have. I'm investigating this. Patch is created on top of trunk r1459441 via git.
          Hide
          Shalin Shekhar Mangar added a comment -

          Updated patch created by git on svn r1458857.

          The test passes – I had forgotten to add shards or distrib=false param and of course the document count was same on parent and sub-shards

          Show
          Shalin Shekhar Mangar added a comment - Updated patch created by git on svn r1458857. The test passes – I had forgotten to add shards or distrib=false param and of course the document count was same on parent and sub-shards
          Hide
          Shalin Shekhar Mangar added a comment -
          Show
          Shalin Shekhar Mangar added a comment - Btw, the github fork is at https://github.com/shalinmangar/lucene-solr
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated over Mark's changes.

          Changes:
          1. Merged to latest trunk
          2. New way of sub-shard replication in DistributedUpdatedProcessor
          3. New operation "updateshardstate" in Overseer to change logical state of a shard
          4. Many bug fixes
          5. A slightly better test – still ways to go

          Anshum and I have been collaborating on building this feature. Since the patch was getting difficult to maintain without overwriting each others' changes, we created a fork in github to develop this feature. The attached patch is a diff against revision 1457757 of trunk (created by git).

          The test case fails right now because the number of documents in sub-shards are not correct (they are the same as the parent shard). Strangely, even if we disable the index split part of the code, the number of documents in parent and sub-shards are equal which makes me believe that there is a bug in core creation or the way the test is setup. I'm looking into that right now.

          The switch to make the parent shard in-active and the sub-shards active is also not correct because it doesn't wait for the sub-shard replicas to recover completely. We need to figure out a way to wait for replicas to be up and running before we switch.

          The way DUPF forwards updates to sub-shard replicas may also open us to race conditions when we switch shard states.

          Show
          Shalin Shekhar Mangar added a comment - Patch updated over Mark's changes. Changes: 1. Merged to latest trunk 2. New way of sub-shard replication in DistributedUpdatedProcessor 3. New operation "updateshardstate" in Overseer to change logical state of a shard 4. Many bug fixes 5. A slightly better test – still ways to go Anshum and I have been collaborating on building this feature. Since the patch was getting difficult to maintain without overwriting each others' changes, we created a fork in github to develop this feature. The attached patch is a diff against revision 1457757 of trunk (created by git). The test case fails right now because the number of documents in sub-shards are not correct (they are the same as the parent shard). Strangely, even if we disable the index split part of the code, the number of documents in parent and sub-shards are equal which makes me believe that there is a bug in core creation or the way the test is setup. I'm looking into that right now. The switch to make the parent shard in-active and the sub-shards active is also not correct because it doesn't wait for the sub-shard replicas to recover completely. We need to figure out a way to wait for replicas to be up and running before we switch. The way DUPF forwards updates to sub-shard replicas may also open us to race conditions when we switch shard states.
          Hide
          Mark Miller added a comment -

          SOLR-4570 filed as well - another issue with a solution in my patch.

          Show
          Mark Miller added a comment - SOLR-4570 filed as well - another issue with a solution in my patch.
          Hide
          Mark Miller added a comment -

          SOLR-4569 is another small improvement issue I'll pull from my patch here.

          Show
          Mark Miller added a comment - SOLR-4569 is another small improvement issue I'll pull from my patch here.
          Hide
          Mark Miller added a comment -

          SOLR-4568 is another issue I found while working on this - I'll pull the fix from my patch to SOLR-4568.

          Show
          Mark Miller added a comment - SOLR-4568 is another issue I found while working on this - I'll pull the fix from my patch to SOLR-4568 .
          Hide
          Anshum Gupta added a comment -

          Thanks for the suggestions on that one Mark. I'll put up a patch soon for SOLR-4566 on the lines of what we discussed and what you've mentioned above.

          Though again, as we're not really using the states anywhere but in the patch for ShardSplitting, it should have no impact.
          However, just as a note, any future use of getSlices would mean handing inactive slices (or calling getActiveSlices) so the behaviour would change a bit (as we start using more of Slice states).

          Show
          Anshum Gupta added a comment - Thanks for the suggestions on that one Mark. I'll put up a patch soon for SOLR-4566 on the lines of what we discussed and what you've mentioned above. Though again, as we're not really using the states anywhere but in the patch for ShardSplitting, it should have no impact. However, just as a note, any future use of getSlices would mean handing inactive slices (or calling getActiveSlices) so the behaviour would change a bit (as we start using more of Slice states).
          Hide
          Mark Miller added a comment -

          So to summarize - to fix this current problem, I think we want to rework the current slice state stuff in trunk - I left open SOLR-4566 for the moment.

          I think the cleanest thing for this API, and what will help keep the current issue from reoccuring, is if we change getSlices and getSliceMap to return all slices always.

          Then we add getActiveSlices and getActiveSliceMap, and appropriate calls are changed to that. Then there are likely to be less surprises when we try and copy/update the clusterstate.

          Show
          Mark Miller added a comment - So to summarize - to fix this current problem, I think we want to rework the current slice state stuff in trunk - I left open SOLR-4566 for the moment. I think the cleanest thing for this API, and what will help keep the current issue from reoccuring, is if we change getSlices and getSliceMap to return all slices always. Then we add getActiveSlices and getActiveSliceMap, and appropriate calls are changed to that. Then there are likely to be less surprises when we try and copy/update the clusterstate.
          Hide
          Mark Miller added a comment - - edited

          Anshum caught me up in chat - I am actually the one that is confused - because slice state stuff has already been committed. I thought I was looking at pre shard splitting trunk code.

          The real problem here is how the slice state is being handled in relation to clusterstate.json updates - you can lose inactive slices from the clusterstate which will cause havoc.

          Show
          Mark Miller added a comment - - edited Anshum caught me up in chat - I am actually the one that is confused - because slice state stuff has already been committed. I thought I was looking at pre shard splitting trunk code. The real problem here is how the slice state is being handled in relation to clusterstate.json updates - you can lose inactive slices from the clusterstate which will cause havoc.
          Hide
          Mark Miller added a comment -

          We wouldn't want any shard assignment/replica addition normally to go to a non-active Slice. I think changing the AssignShard to use getAllSlices may do what we're trying to avoid here.

          No, I think you are confusing slice/shard state.

          Show
          Mark Miller added a comment - We wouldn't want any shard assignment/replica addition normally to go to a non-active Slice. I think changing the AssignShard to use getAllSlices may do what we're trying to avoid here. No, I think you are confusing slice/shard state.
          Hide
          Mark Miller added a comment -

          Here is another patch layered on top of the committed SOLR-4566 work.

          Show
          Mark Miller added a comment - Here is another patch layered on top of the committed SOLR-4566 work.
          Hide
          Anshum Gupta added a comment -

          We wouldn't want any shard assignment/replica addition normally to go to a non-active Slice. I think changing the AssignShard to use getAllSlices may do what we're trying to avoid here.

          Show
          Anshum Gupta added a comment - We wouldn't want any shard assignment/replica addition normally to go to a non-active Slice. I think changing the AssignShard to use getAllSlices may do what we're trying to avoid here.
          Hide
          Mark Miller added a comment -

          I filed SOLR-4566 for the main issue.

          Show
          Mark Miller added a comment - I filed SOLR-4566 for the main issue.
          Hide
          Mark Miller added a comment -

          Well that was a b**ch

          Here is a patch that should unblock you. I'll probably spin my changes into one or two other JIRA's.

          There were some recent changes around what slices are returned when during some of the cluster state refactoring. It's the first time I've stumbled upon it, and I'm not sure what I think of it, it seems somewhat easy to screw up. Anyway, for now I've left it and added some changes to get this working better. It still fails in the split code, but it should get you to new ground.

          Show
          Mark Miller added a comment - Well that was a b**ch Here is a patch that should unblock you. I'll probably spin my changes into one or two other JIRA's. There were some recent changes around what slices are returned when during some of the cluster state refactoring. It's the first time I've stumbled upon it, and I'm not sure what I think of it, it seems somewhat easy to screw up. Anyway, for now I've left it and added some changes to get this working better. It still fails in the split code, but it should get you to new ground.
          Hide
          Mark Miller added a comment -

          I think collection might be a better param name than name for the shard split api

          Show
          Mark Miller added a comment - I think collection might be a better param name than name for the shard split api
          Hide
          Anshum Gupta added a comment -

          Added replica creation to the earlier 'combined' patch that Shalin had put up. This is yet to be tested as we're yet to fix the 2nd core creation issue.

          Show
          Anshum Gupta added a comment - Added replica creation to the earlier 'combined' patch that Shalin had put up. This is yet to be tested as we're yet to fix the 2nd core creation issue.
          Hide
          Mark Miller added a comment -

          ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:326)

          You probably need to avoid this call in this case - it's really for starting up a cold previously started cluster. It's a little odd that it hits on the second sub shard and not the first. I might be able to take a closer look when I get some free time.

          Show
          Mark Miller added a comment - ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:326) You probably need to avoid this call in this case - it's really for starting up a cold previously started cluster. It's a little odd that it hits on the second sub shard and not the first. I might be able to take a closer look when I get some free time.
          Hide
          Shalin Shekhar Mangar added a comment -

          Attached is a first iteration of the patch. It is not working yet but a lot of the pieces required for this feature are coming together in this patch. Soon, I'd like to split it into smaller, committable issues/patches.

          This is a very rough cut but any comments/suggestions on the general approach and/or the patch will be very helpful.

          Changes:

          • OverseerCollectionProcessor has a new command called splitshard which executes the whole strategy
          • Implicit support for creating new shards (by creating a core with a new shard in "construction" state)
          • "Construction" state is introduced for shards
          • CoreAdmin has a new APPLYBUFFEREDUPDATES task which, as the name says, instructs a core to apply all buffered updates

          TODO:

          • Create replicas for new sub-shards
          • Figure out how and where to set sub-shards as active and divert traffic away from parent shard
          • Tests (both unit and ChaosMonkey based)

          Right now, I'm running into a bug where the first sub-shard creation goes through but the second sub-shard coreadmin "CREATE" command blocks in .ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:326)

          Can anyone tell me what I've overlooked?

          Show
          Shalin Shekhar Mangar added a comment - Attached is a first iteration of the patch. It is not working yet but a lot of the pieces required for this feature are coming together in this patch. Soon, I'd like to split it into smaller, committable issues/patches. This is a very rough cut but any comments/suggestions on the general approach and/or the patch will be very helpful. Changes: OverseerCollectionProcessor has a new command called splitshard which executes the whole strategy Implicit support for creating new shards (by creating a core with a new shard in "construction" state) "Construction" state is introduced for shards CoreAdmin has a new APPLYBUFFEREDUPDATES task which, as the name says, instructs a core to apply all buffered updates TODO: Create replicas for new sub-shards Figure out how and where to set sub-shards as active and divert traffic away from parent shard Tests (both unit and ChaosMonkey based) Right now, I'm running into a bug where the first sub-shard creation goes through but the second sub-shard coreadmin "CREATE" command blocks in .ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:326) Can anyone tell me what I've overlooked?
          Hide
          Shalin Shekhar Mangar added a comment -

          I'd like to suggest supporting only a single shard through this API.

          +1

          Show
          Shalin Shekhar Mangar added a comment - I'd like to suggest supporting only a single shard through this API. +1
          Hide
          Anshum Gupta added a comment -

          I'd like to suggest supporting only a single shard through this API. It may be called multiple times for more than one shards.

          In the future however, we may want to have a split API call which splits all existing shards, but that could be a different thing (if required).

          Show
          Anshum Gupta added a comment - I'd like to suggest supporting only a single shard through this API. It may be called multiple times for more than one shards. In the future however, we may want to have a split API call which splits all existing shards, but that could be a different thing (if required).
          Hide
          Shalin Shekhar Mangar added a comment -

          How do we know what collection? I assume there will be a "collection" parameter?

          Yes, a collection param will also be present.

          shard.keys is currently used in routing request (and the values are often not shard names), so we probably shouldn't overload it here. After all, it may make sense in the future to be able to use shard.keys to specify which shard you want to split!

          Yes! That is exactly the thinking behind shard.keys here. It is not being overloaded but used to indicate which shard to split by specifying the key which resolves to a shard name.

          Related: SOLR-4503 - we now have the capability to use restlet, and should consider doing so for new APIs like this.

          I'm not familiar with restlet. I'll take a look at it.

          Show
          Shalin Shekhar Mangar added a comment - How do we know what collection? I assume there will be a "collection" parameter? Yes, a collection param will also be present. shard.keys is currently used in routing request (and the values are often not shard names), so we probably shouldn't overload it here. After all, it may make sense in the future to be able to use shard.keys to specify which shard you want to split! Yes! That is exactly the thinking behind shard.keys here. It is not being overloaded but used to indicate which shard to split by specifying the key which resolves to a shard name. Related: SOLR-4503 - we now have the capability to use restlet, and should consider doing so for new APIs like this. I'm not familiar with restlet. I'll take a look at it.
          Hide
          Yonik Seeley added a comment -

          "The collections api may be invoked as follows:
          http://host:port/solr/admin/collections?action=SPLIT&shard=<shard_1>&shard=<shard_2>

          Ok, I assume this is for splitting more than one shard (i.e. both shard_1 and shard_2 in this example will be split?)
          How do we know what collection? I assume there will be a "collection" parameter?

          Related: SOLR-4503 - we now have the capability to use restlet, and should consider doing so for new APIs like this.

          Sometimes, shard names are automatically assigned by SolrCloud and it may be more convenient for users to specify shards by shard keys instead of shard names e.g.
          http://host:port/solr/admin/collections?action=SPLIT&shard.keys=shardKey1,shardKey2"

          shard.keys is currently used in routing request (and the values are often not shard names), so we probably shouldn't overload it here. After all, it may make sense in the future to be able to use shard.keys to specify which shard you want to split!

          Show
          Yonik Seeley added a comment - "The collections api may be invoked as follows: http://host:port/solr/admin/collections?action=SPLIT&shard= <shard_1>&shard=<shard_2> Ok, I assume this is for splitting more than one shard (i.e. both shard_1 and shard_2 in this example will be split?) How do we know what collection? I assume there will be a "collection" parameter? Related: SOLR-4503 - we now have the capability to use restlet, and should consider doing so for new APIs like this. Sometimes, shard names are automatically assigned by SolrCloud and it may be more convenient for users to specify shards by shard keys instead of shard names e.g. http://host:port/solr/admin/collections?action=SPLIT&shard.keys=shardKey1,shardKey2 " shard.keys is currently used in routing request (and the values are often not shard names), so we probably shouldn't overload it here. After all, it may make sense in the future to be able to use shard.keys to specify which shard you want to split!
          Hide
          Anshum Gupta added a comment -

          Any suggestions/feedback on the earlier comment about the Collections API would be good. Here's what the collections API call(s) would look like:

          "The collections api may be invoked as follows:
          http://host:port/solr/admin/collections?action=SPLIT&shard=<shard_1>&shard=<shard_2>

          Sometimes, shard names are automatically assigned by SolrCloud and it may be more convenient for users to specify shards by shard keys instead of shard names e.g.
          http://host:port/solr/admin/collections?action=SPLIT&shard.keys=shardKey1,shardKey2"

          Show
          Anshum Gupta added a comment - Any suggestions/feedback on the earlier comment about the Collections API would be good. Here's what the collections API call(s) would look like: "The collections api may be invoked as follows: http://host:port/solr/admin/collections?action=SPLIT&shard= <shard_1>&shard=<shard_2> Sometimes, shard names are automatically assigned by SolrCloud and it may be more convenient for users to specify shards by shard keys instead of shard names e.g. http://host:port/solr/admin/collections?action=SPLIT&shard.keys=shardKey1,shardKey2 "
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Shalin Shekhar Mangar
          http://svn.apache.org/viewvc?view=revision&revision=1447516

          SOLR-3755: Do not create core on split action, use 'targetCore' param instead

          Show
          Commit Tag Bot added a comment - [trunk commit] Shalin Shekhar Mangar http://svn.apache.org/viewvc?view=revision&revision=1447516 SOLR-3755 : Do not create core on split action, use 'targetCore' param instead
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Shalin Shekhar Mangar
          http://svn.apache.org/viewvc?view=revision&revision=1447517

          SOLR-3755: Do not create core on split action, use 'targetCore' param instead

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Shalin Shekhar Mangar http://svn.apache.org/viewvc?view=revision&revision=1447517 SOLR-3755 : Do not create core on split action, use 'targetCore' param instead
          Hide
          Anshum Gupta added a comment -

          Change for CoreAdmin split to not create cores. It accepts core names to which split index should be merged.

          Show
          Anshum Gupta added a comment - Change for CoreAdmin split to not create cores. It accepts core names to which split index should be merged.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Shalin Shekhar Mangar
          http://svn.apache.org/viewvc?view=revision&revision=1444398

          –SOLR-3755: Test for SolrIndexSplitter

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Shalin Shekhar Mangar http://svn.apache.org/viewvc?view=revision&revision=1444398 – SOLR-3755 : Test for SolrIndexSplitter
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Shalin Shekhar Mangar
          http://svn.apache.org/viewvc?view=revision&revision=1444397

          SOLR-3755: Test for SolrIndexSplitter

          Show
          Commit Tag Bot added a comment - [trunk commit] Shalin Shekhar Mangar http://svn.apache.org/viewvc?view=revision&revision=1444397 SOLR-3755 : Test for SolrIndexSplitter
          Hide
          Shalin Shekhar Mangar added a comment -

          A simple test for SolrIndexSplitter is attached

          Show
          Shalin Shekhar Mangar added a comment - A simple test for SolrIndexSplitter is attached
          Hide
          Shalin Shekhar Mangar added a comment -

          Thinking more about the general strategy that Yonik's devised for this feature, here is a rough draft of how it may go.

          A split operation is triggered via collections API

          • Overseer Collection Processor (CP) creates new sub-shard in ZK in "Construction" state s.t. first node to join the shard becomes the leader and thereafter leaders are not elected automatically. Replicas are not automatically created in the “Construction” state
          • CP creates new cores on leader using the core/cloud descriptors of parent core.
            • Such cores are automatically designated as leader for respective sub-shard
            • These new cores join sub-shards in buffering-update mode and keep themselves in that mode until the shard changes its state.
            • DUPF on parent forwards only relevant updates to sub-shard core.
          • CP calls CoreAdmin split on leader of shard
            • A hard commit is called and index is split and written into correct cores
          • CP puts shard into Recovery state
            • Sub-shard cores go into apply-buffered-updates mode.
            • CP puts a watch on sub-shard cores status
          • Once sub-shard core status becomes active, Overseer creates replicas and watches their state
          • Once a number of replicas (ceil(numReplicas/2) is enough?) have recovered for all sub-shards, atomically set sub-shard active and parent shard in-active.

          Suggestions/comments welcome.

          Show
          Shalin Shekhar Mangar added a comment - Thinking more about the general strategy that Yonik's devised for this feature, here is a rough draft of how it may go. A split operation is triggered via collections API Overseer Collection Processor (CP) creates new sub-shard in ZK in "Construction" state s.t. first node to join the shard becomes the leader and thereafter leaders are not elected automatically. Replicas are not automatically created in the “Construction” state CP creates new cores on leader using the core/cloud descriptors of parent core. Such cores are automatically designated as leader for respective sub-shard These new cores join sub-shards in buffering-update mode and keep themselves in that mode until the shard changes its state. DUPF on parent forwards only relevant updates to sub-shard core. CP calls CoreAdmin split on leader of shard A hard commit is called and index is split and written into correct cores CP puts shard into Recovery state Sub-shard cores go into apply-buffered-updates mode. CP puts a watch on sub-shard cores status Once sub-shard core status becomes active, Overseer creates replicas and watches their state Once a number of replicas (ceil(numReplicas/2) is enough?) have recovered for all sub-shards, atomically set sub-shard active and parent shard in-active. Suggestions/comments welcome.
          Hide
          Shalin Shekhar Mangar added a comment -

          We need to introduce shard states into the design. SolrCloud shards are always “active” i.e. no state information is associated with shards presently. I'm planning to add two new states viz. “Construction” and “Recovery” besides the default “Active” state.

          A shard in “Construction” state has the following properties:

          • Shard nodes receive no queries
          • Shard nodes receive no updates except those forwarded by leaders
          • Overseer does not allocate nodes to such a shard automatically
          • Leader election is disabled for such a shard
          • Shard nodes automatically go into recovering state (buffering update mode)

          A shard in “Recovering” phase is similar to a shard in "Construction" state except that shard nodes automatically go into recovering state (“APPLYING_BUFFERED” mode and once completed into active state).

          We could merge the two states together if necessary once we start implementing stuff.

          Show
          Shalin Shekhar Mangar added a comment - We need to introduce shard states into the design. SolrCloud shards are always “active” i.e. no state information is associated with shards presently. I'm planning to add two new states viz. “Construction” and “Recovery” besides the default “Active” state. A shard in “Construction” state has the following properties: Shard nodes receive no queries Shard nodes receive no updates except those forwarded by leaders Overseer does not allocate nodes to such a shard automatically Leader election is disabled for such a shard Shard nodes automatically go into recovering state (buffering update mode) A shard in “Recovering” phase is similar to a shard in "Construction" state except that shard nodes automatically go into recovering state (“APPLYING_BUFFERED” mode and once completed into active state). We could merge the two states together if necessary once we start implementing stuff.
          Hide
          Shalin Shekhar Mangar added a comment -

          I'm planning to split the "split" API into a high-level collections API and a low-level core admin API.

          The low level core admin API may not be solr cloud aware but would work completely on a single node. This way, a non solr cloud aware node can also make use of a split index feature.

          The Core Admin command may be invoked as:

          http://host:port/solr/admin/cores?core=<core_name>&action=SPLIT&path=/path/1&path=/path/2&path=/path/3

          or, as:

          http://host:port/solr/admin/cores?core=<core_name>&action=SPLIT&targetCore=core1&targetCore=core2

          The "path" and "targetCore" parameter is multi-valued and the collection will be split into as many number of sub-shards as the number of "path" values.

          The collections api may be invoked as follows:

          http://host:port/solr/admin/collections?action=SPLIT&shard=<shard_1>&shard=<shard_2>

          Sometimes, shard names are automatically assigned by SolrCloud and it may be more convenient for users to specify shards by shard keys instead of shard names e.g.

          http://host:port/solr/admin/collections?action=SPLIT&shard.keys=shardKey1,shardKey2
          Show
          Shalin Shekhar Mangar added a comment - I'm planning to split the "split" API into a high-level collections API and a low-level core admin API. The low level core admin API may not be solr cloud aware but would work completely on a single node. This way, a non solr cloud aware node can also make use of a split index feature. The Core Admin command may be invoked as: http: //host:port/solr/admin/cores?core=<core_name>&action=SPLIT&path=/path/1&path=/path/2&path=/path/3 or, as: http: //host:port/solr/admin/cores?core=<core_name>&action=SPLIT&targetCore=core1&targetCore=core2 The "path" and "targetCore" parameter is multi-valued and the collection will be split into as many number of sub-shards as the number of "path" values. The collections api may be invoked as follows: http: //host:port/solr/admin/collections?action=SPLIT&shard=<shard_1>&shard=<shard_2> Sometimes, shard names are automatically assigned by SolrCloud and it may be more convenient for users to specify shards by shard keys instead of shard names e.g. http: //host:port/solr/admin/collections?action=SPLIT&shard.keys=shardKey1,shardKey2
          Hide
          Dmitry Kan added a comment -

          "Somewhat related: control naming of shards. This could be applicable for both hashing based collections and custom sharding based collections. shardNames=myshard1,myshard2,myshard3?"

          Would this suit to logical (e.g. date based) sharding as well? Do you plan to support such a sharding type in the current shard splitting implementation? Not sure, if this helps: we have implemented our own custom date based sharding (splitting and routing) for solr 3.x and found it to be the most logical way of sharding our data (both from the load balancing and use case point of view). The routing implementation is done via loading a custom shards config file that contains mapping of date ranges to shards.

          Show
          Dmitry Kan added a comment - "Somewhat related: control naming of shards. This could be applicable for both hashing based collections and custom sharding based collections. shardNames=myshard1,myshard2,myshard3?" Would this suit to logical (e.g. date based) sharding as well? Do you plan to support such a sharding type in the current shard splitting implementation? Not sure, if this helps: we have implemented our own custom date based sharding (splitting and routing) for solr 3.x and found it to be the most logical way of sharding our data (both from the load balancing and use case point of view). The routing implementation is done via loading a custom shards config file that contains mapping of date ranges to shards.
          Hide
          Yonik Seeley added a comment -

          OK, after chatting w/ Mark a bit, this seems to be his use-case: A pre-configured cluster w/ no information yet in ZK.
          Currently implemented via:

          • configuring the collection & shard of each core in solr.xml
          • bring all of those cores up
          • start indexing (and in 4.0 style, the correct shard is picked via hashing and splitting up the range according to the currently known shards)

          This 4.0 behavior could be replicated via a "lazyHash" router that simply splits the hash range over currently know shards at the time of every request. This is fragile and error prone for many users of course, so it would not be a default. Additionally, we would need code to explicitly specify the router for a collection (assuming the collection had not already been created).

          Somewhat related: control naming of shards. This could be applicable for both hashing based collections and custom sharding based collections. shardNames=myshard1,myshard2,myshard3?

          Show
          Yonik Seeley added a comment - OK, after chatting w/ Mark a bit, this seems to be his use-case: A pre-configured cluster w/ no information yet in ZK. Currently implemented via: configuring the collection & shard of each core in solr.xml bring all of those cores up start indexing (and in 4.0 style, the correct shard is picked via hashing and splitting up the range according to the currently known shards) This 4.0 behavior could be replicated via a "lazyHash" router that simply splits the hash range over currently know shards at the time of every request. This is fragile and error prone for many users of course, so it would not be a default. Additionally, we would need code to explicitly specify the router for a collection (assuming the collection had not already been created). Somewhat related: control naming of shards. This could be applicable for both hashing based collections and custom sharding based collections. shardNames=myshard1,myshard2,myshard3?
          Hide
          Mark Miller added a comment -

          This has a back compat break that we should address somehow or at least mention in changes - previously you could specify explicit shard ids and still get distributed updates - now if you do that, you won't get distrib updates as shards won't be assigned ranges.

          Show
          Mark Miller added a comment - This has a back compat break that we should address somehow or at least mention in changes - previously you could specify explicit shard ids and still get distributed updates - now if you do that, you won't get distrib updates as shards won't be assigned ranges.
          Hide
          Radim Kolar added a comment -
          Show
          Radim Kolar added a comment - Useful theory about rehashing http://en.wikipedia.org/wiki/Consistent_hashing
          Hide
          Yonik Seeley added a comment -

          Make Slice subclass ZkNodeProps

          After a lot of code modification, I've realized that "ZkNodeProps" was probably supposed to be the same as "Replica". I was fooled by thinking it was generic properties in ZK on any type of node (slice, replica, or whatever), and that was reinforced by it's use in other context as generic properties (messages in the overseer queue use ZkNodeProps as general properties - Overseer.java:125)

          Given that Node also has another meaning (A Node is a CoreContainer/JVM that can contain multiple cores), I'm leaning toward renaming ZkNodeProps to Replica, and making a truly generic class ZkProps that Replica, Slice, etc, can subclass from.

          Here's an example of the types of code changes I've been making to hopefully make things more readable:

          -    for (Map.Entry<String,Slice> entry : slices.entrySet()) {
          -      Slice slice = entry.getValue();
          -      Map<String,ZkNodeProps> shards = slice.getShards();
          -      Set<Map.Entry<String,ZkNodeProps>> shardEntries = shards.entrySet();
          -      for (Map.Entry<String,ZkNodeProps> shardEntry : shardEntries) {
          -        final ZkNodeProps node = shardEntry.getValue();
          -        if (clusterState.liveNodesContain(node.get(ZkStateReader.NODE_NAME_PROP))) {
          -          return new ZkCoreNodeProps(node).getCoreUrl();
          +    for (Slice slice : slices.values()) {
          +      for (Replica replica : slice.getReplicas()) {
          +        if (clusterState.liveNodesContain(replica.get(ZkStateReader.NODE_NAME_PROP))) {
          +          return new ZkCoreNodeProps(replica).getCoreUrl();
          

          Unfortunately, when I got all done, ZK related tests were no longer passing.
          I'm going to try and make another attempt and see if I can make more incremental changes (so that I can run tests periodically).

          Show
          Yonik Seeley added a comment - Make Slice subclass ZkNodeProps After a lot of code modification, I've realized that "ZkNodeProps" was probably supposed to be the same as "Replica". I was fooled by thinking it was generic properties in ZK on any type of node (slice, replica, or whatever), and that was reinforced by it's use in other context as generic properties (messages in the overseer queue use ZkNodeProps as general properties - Overseer.java:125) Given that Node also has another meaning (A Node is a CoreContainer/JVM that can contain multiple cores), I'm leaning toward renaming ZkNodeProps to Replica, and making a truly generic class ZkProps that Replica, Slice, etc, can subclass from. Here's an example of the types of code changes I've been making to hopefully make things more readable: - for (Map.Entry< String ,Slice> entry : slices.entrySet()) { - Slice slice = entry.getValue(); - Map< String ,ZkNodeProps> shards = slice.getShards(); - Set<Map.Entry< String ,ZkNodeProps>> shardEntries = shards.entrySet(); - for (Map.Entry< String ,ZkNodeProps> shardEntry : shardEntries) { - final ZkNodeProps node = shardEntry.getValue(); - if (clusterState.liveNodesContain(node.get(ZkStateReader.NODE_NAME_PROP))) { - return new ZkCoreNodeProps(node).getCoreUrl(); + for (Slice slice : slices.values()) { + for (Replica replica : slice.getReplicas()) { + if (clusterState.liveNodesContain(replica.get(ZkStateReader.NODE_NAME_PROP))) { + return new ZkCoreNodeProps(replica).getCoreUrl(); Unfortunately, when I got all done, ZK related tests were no longer passing. I'm going to try and make another attempt and see if I can make more incremental changes (so that I can run tests periodically).
          Hide
          Yonik Seeley added a comment -

          I've run into a few impedance mismatch issues implementing the JSON above.
          Internally we seem to use ZkNodeProps which accepts Map<String,String>... but a JSON Map is better represented as a Map<String,Object>.

          I think I'll try going in the following direction:

          • Make ZkNodeProps that accepts Map<String,Object> as properties, and can thus represent integers and more complex types. This will be just like a Map, but add some convenience methods
          • Make Slice subclass ZkNodeProps
          • Make a new Replica class (instead of just representing it as a generic ZkNodeProps)

          In general, to construct these classes from JSON, it seems like we should just pass the Map<String,Object> generated from the JSON parser and then the constructor can pull out key elements and construct sub-elements.

          Thoughts?

          Show
          Yonik Seeley added a comment - I've run into a few impedance mismatch issues implementing the JSON above. Internally we seem to use ZkNodeProps which accepts Map<String,String>... but a JSON Map is better represented as a Map<String,Object>. I think I'll try going in the following direction: Make ZkNodeProps that accepts Map<String,Object> as properties, and can thus represent integers and more complex types. This will be just like a Map, but add some convenience methods Make Slice subclass ZkNodeProps Make a new Replica class (instead of just representing it as a generic ZkNodeProps) In general, to construct these classes from JSON, it seems like we should just pass the Map<String,Object> generated from the JSON parser and then the constructor can pull out key elements and construct sub-elements. Thoughts?
          Hide
          Grant Ingersoll added a comment -

          +1 on the first option. I think it's considered good JSON practice to have key names not contain state, but I can't remember where I saw that.

          Show
          Grant Ingersoll added a comment - +1 on the first option. I think it's considered good JSON practice to have key names not contain state, but I can't remember where I saw that.
          Hide
          Mark Miller added a comment -

          Yeah, I like the first option as well.

          Show
          Mark Miller added a comment - Yeah, I like the first option as well.
          Hide
          Yonik Seeley added a comment -

          It seems like we need logical shard parameters (i.e. Slice class), but we don't currently have a place for them.
          These parameters would include:

          • collection (this is somewhat redundant, but belongs more on a slice than on a replica)
          • replication factor (i.e. in time based sharding, one may want more replicas of recent shards to handle greater query throughput)
          • hash range(s) covered by the slice
          • maybe a pointer to the leader, rather than having to search through the nodes?

          You can see the previous structure of cloudstate from my previous message.

          One fix is to introduce a "nodes" or "replicas" level to contain the nodes and leave the other properties as top-level:

            "shard1": {
              "replication_factor" : 3,
              "range" : "00000000-3fffffff",
              "nodes" : {
                "Rogue:8983_solr_collection1":{
                  "state" : "active"
                }
              }
            }
          

          Another way is to introduce a "props" to store properties:

            "shard1": {
              "props" : {
                "replication_factor" : 3,
                "range" : "00000000-3fffffff"
              },
              "Rogue:8983_solr_collection1":{
                "state" : "active"
              }
            }
          

          The first option feels more natural to me - properties are directly under the shard, and the nodes of a shard are simply another property.

          Show
          Yonik Seeley added a comment - It seems like we need logical shard parameters (i.e. Slice class), but we don't currently have a place for them. These parameters would include: collection (this is somewhat redundant, but belongs more on a slice than on a replica) replication factor (i.e. in time based sharding, one may want more replicas of recent shards to handle greater query throughput) hash range(s) covered by the slice maybe a pointer to the leader, rather than having to search through the nodes? You can see the previous structure of cloudstate from my previous message. One fix is to introduce a "nodes" or "replicas" level to contain the nodes and leave the other properties as top-level: "shard1" : { "replication_factor" : 3, "range" : "00000000-3fffffff" , "nodes" : { "Rogue:8983_solr_collection1" :{ "state" : "active" } } } Another way is to introduce a "props" to store properties: "shard1" : { "props" : { "replication_factor" : 3, "range" : "00000000-3fffffff" }, "Rogue:8983_solr_collection1" :{ "state" : "active" } } The first option feels more natural to me - properties are directly under the shard, and the nodes of a shard are simply another property.
          Hide
          Yonik Seeley added a comment -

          Just committed some more progress.

          http://svn.apache.org/viewvc?rev=1380287&view=rev

          I started up a ZK cluster with one shard, one node.
          curl "http://localhost:8983/solr/admin/cores?core=collection1&action=SPLIT"

          The cloud state after looks like

          {"collection1":{
              "shard1":{"Rogue:8983_solr_collection1":{
                  "shard":"shard1",
                  "roles":null,
                  "leader":"true",
                  "state":"active",
                  "core":"collection1",
                  "collection":"collection1",
                  "node_name":"Rogue:8983_solr",
                  "base_url":"http://Rogue:8983/solr"}},
              "shard1_0":{"Rogue:8983_solr_collection1_0":{
                  "shard":"shard1_0",
                  "leader":"true",
                  "roles":null,
                  "state":"active",
                  "core":"collection1_0",
                  "collection":"collection1",
                  "node_name":"Rogue:8983_solr",
                  "base_url":"http://Rogue:8983/solr"}},
              "shard1_1":{"Rogue:8983_solr_collection1_1":{
                  "shard":"shard1_1",
                  "roles":null,
                  "leader":"true",
                  "state":"active",
                  "core":"collection1_1",
                  "collection":"collection1",
                  "node_name":"Rogue:8983_solr",
                  "base_url":"http://Rogue:8983/solr"}}}}
          

          The original core had 32 docs. After I did a manual commit on both of the new cores, the first showed 14 docs and the second 18.

          Show
          Yonik Seeley added a comment - Just committed some more progress. http://svn.apache.org/viewvc?rev=1380287&view=rev I started up a ZK cluster with one shard, one node. curl "http://localhost:8983/solr/admin/cores?core=collection1&action=SPLIT" The cloud state after looks like { "collection1" :{ "shard1" :{ "Rogue:8983_solr_collection1" :{ "shard" : "shard1" , "roles" : null , "leader" : " true " , "state" : "active" , "core" : "collection1" , "collection" : "collection1" , "node_name" : "Rogue:8983_solr" , "base_url" : "http: //Rogue:8983/solr" }}, "shard1_0" :{ "Rogue:8983_solr_collection1_0" :{ "shard" : "shard1_0" , "leader" : " true " , "roles" : null , "state" : "active" , "core" : "collection1_0" , "collection" : "collection1" , "node_name" : "Rogue:8983_solr" , "base_url" : "http: //Rogue:8983/solr" }}, "shard1_1" :{ "Rogue:8983_solr_collection1_1" :{ "shard" : "shard1_1" , "roles" : null , "leader" : " true " , "state" : "active" , "core" : "collection1_1" , "collection" : "collection1" , "node_name" : "Rogue:8983_solr" , "base_url" : "http: //Rogue:8983/solr" }}}} The original core had 32 docs. After I did a manual commit on both of the new cores, the first showed 14 docs and the second 18.
          Hide
          Yonik Seeley added a comment - - edited

          So we need to have new cores up and running, and then install the new indexes in them.
          We could either do it like replication and use a new index directory (and use a property file to redirect to that latest index), or we could try and make sure that there is no open writer on the new core and then split directly into the normal core index directory.
          edit: we could also simply use the existing writers on the new cores to "merge in" the new indexes (we just need to ensure that they are empty)

          Show
          Yonik Seeley added a comment - - edited So we need to have new cores up and running, and then install the new indexes in them. We could either do it like replication and use a new index directory (and use a property file to redirect to that latest index), or we could try and make sure that there is no open writer on the new core and then split directly into the normal core index directory. edit: we could also simply use the existing writers on the new cores to "merge in" the new indexes (we just need to ensure that they are empty)
          Hide
          Yonik Seeley added a comment -

          I am just confused, why does it not use FixedBitSet?

          Habit... OpenBitSet is just the class I'm used to and my fingers automatically type.

          The merged indexes will have no deletions at all, because it merges not copies.

          Cool, thanks for the clarification - I'll update the comment in my local copy.

          Show
          Yonik Seeley added a comment - I am just confused, why does it not use FixedBitSet? Habit... OpenBitSet is just the class I'm used to and my fingers automatically type. The merged indexes will have no deletions at all, because it merges not copies. Cool, thanks for the clarification - I'll update the comment in my local copy.
          Hide
          Uwe Schindler added a comment -
          // TODO: will many deletes have been removed, or should we optimize?
          

          The merged indexes will have no deletions at all, because it merges not copies. IndexWriter.addIndexes(IndexReader...) does the same like a standard Lucene merge, IndexWriter.addIndexes(Directory) just copies the segment files. This is a plain stupid merge of a segment that has additional, overlaid deletions.

          Show
          Uwe Schindler added a comment - // TODO: will many deletes have been removed, or should we optimize? The merged indexes will have no deletions at all, because it merges not copies. IndexWriter.addIndexes(IndexReader...) does the same like a standard Lucene merge, IndexWriter.addIndexes(Directory) just copies the segment files. This is a plain stupid merge of a segment that has additional, overlaid deletions.
          Hide
          Uwe Schindler added a comment - - edited

          Hi Yonik,
          looks nice, similar to oal.index.MultiPassIndexSplitter / PKIndexSplitter in misc module just using a HashPartitioned LiveDocs. I am just confused, why does it not use FixedBitSet? The length is fixed and no (int) casts needed.

          +1 otherwise

          Show
          Uwe Schindler added a comment - - edited Hi Yonik, looks nice, similar to oal.index.MultiPassIndexSplitter / PKIndexSplitter in misc module just using a HashPartitioned LiveDocs. I am just confused, why does it not use FixedBitSet? The length is fixed and no (int) casts needed. +1 otherwise
          Hide
          Yonik Seeley added a comment -

          Since this doesn't change any existing func, I've committed what I have now to enable easier integration/modification by others.

          Show
          Yonik Seeley added a comment - Since this doesn't change any existing func, I've committed what I have now to enable easier integration/modification by others.
          Hide
          Yonik Seeley added a comment -

          OK, here's a patch with minimal functionality that seems to work (no tests yet though):

          curl "http://localhost:8983/solr/admin/cores?core=collection1&action=SPLIT&path=/tmp/1&path=/tmp/2"
          

          That command will split the index in two. It will not consider the existing "range" of the current core or create new cores or anything like that yet... it only splits the index and writes them in separate directories.

          Show
          Yonik Seeley added a comment - OK, here's a patch with minimal functionality that seems to work (no tests yet though): curl "http: //localhost:8983/solr/admin/cores?core=collection1&action=SPLIT&path=/tmp/1&path=/tmp/2" That command will split the index in two. It will not consider the existing "range" of the current core or create new cores or anything like that yet... it only splits the index and writes them in separate directories.
          Hide
          Yonik Seeley added a comment -

          Draft (unfinished) patch just to let people know where I'm going...

          Show
          Yonik Seeley added a comment - Draft (unfinished) patch just to let people know where I'm going...
          Hide
          Yonik Seeley added a comment - - edited

          We need to associate hash ranges with shards and allow overlapping shards (i.e. 1-10, 1-5,6-10)

          General Strategy for splitting w/ no service interruptions:

          • Bring up 2 new cores on the same node, covering the new hash ranges
          • Both cores should go into recovery mode (i.e. leader should start
            forwarding updates)
          • leaders either need to consider these new smaller shards as replicas, or they need to forward to the "leader" for the new smaller shard
          • searches should no longer go across all shards, but should just span the complete hash range
          • leader does a hard commit and splits the index
          • Smaller indexes are installed on the new cores
          • Overseer should create new replicas for new shards
          • Mark old shard as “retired” – some mechanism to shut it down (after there is an acceptable amount of coverage of the new shards via replicas)

          Future: allow splitting even with “custom” shards

          Show
          Yonik Seeley added a comment - - edited We need to associate hash ranges with shards and allow overlapping shards (i.e. 1-10, 1-5,6-10) General Strategy for splitting w/ no service interruptions: Bring up 2 new cores on the same node, covering the new hash ranges Both cores should go into recovery mode (i.e. leader should start forwarding updates) leaders either need to consider these new smaller shards as replicas, or they need to forward to the "leader" for the new smaller shard searches should no longer go across all shards, but should just span the complete hash range leader does a hard commit and splits the index Smaller indexes are installed on the new cores Overseer should create new replicas for new shards Mark old shard as “retired” – some mechanism to shut it down (after there is an acceptable amount of coverage of the new shards via replicas) Future: allow splitting even with “custom” shards

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Yonik Seeley
            • Votes:
              13 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development