Solr
  1. Solr
  2. SOLR-5381 Split Clusterstate and scale
  3. SOLR-5473

Split clusterstate.json per collection and watch states selectively

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, Trunk
    • Component/s: SolrCloud
    • Labels:

      Description

      As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node and watches state changes selectively.

      https://reviews.apache.org/r/24220/

      1. SOLR-5473.patch
        51 kB
        Noble Paul
      2. SOLR-5473.patch
        63 kB
        Noble Paul
      3. SOLR-5473.patch
        43 kB
        Noble Paul
      4. SOLR-5473.patch
        46 kB
        Noble Paul
      5. SOLR-5473.patch
        46 kB
        Noble Paul
      6. SOLR-5473.patch
        48 kB
        Noble Paul
      7. SOLR-5473.patch
        52 kB
        Noble Paul
      8. SOLR-5473.patch
        51 kB
        Noble Paul
      9. SOLR-5473.patch
        53 kB
        Noble Paul
      10. SOLR-5473.patch
        54 kB
        Noble Paul
      11. SOLR-5473.patch
        57 kB
        Noble Paul
      12. SOLR-5473.patch
        55 kB
        Noble Paul
      13. SOLR-5473.patch
        60 kB
        Noble Paul
      14. SOLR-5473.patch
        53 kB
        Noble Paul
      15. SOLR-5473.patch
        40 kB
        Noble Paul
      16. SOLR-5473.patch
        52 kB
        Noble Paul
      17. SOLR-5473.patch
        54 kB
        Noble Paul
      18. SOLR-5473.patch
        70 kB
        Noble Paul
      19. SOLR-5473-74.patch
        87 kB
        Noble Paul
      20. SOLR-5473-74.patch
        70 kB
        Noble Paul
      21. ec2-50-16-38-73_solr.log
        493 kB
        Timothy Potter
      22. ec2-23-20-119-52_solr.log
        1.84 MB
        Timothy Potter
      23. SOLR-5473-74.patch
        74 kB
        Noble Paul
      24. SOLR-5473-74.patch
        75 kB
        Noble Paul
      25. SOLR-5473-74.patch
        76 kB
        Noble Paul
      26. SOLR-5473-74.patch
        79 kB
        Noble Paul
      27. SOLR-5473-74.patch
        79 kB
        Noble Paul
      28. SOLR-5473-74.patch
        84 kB
        Noble Paul
      29. SOLR-5473-74.patch
        73 kB
        Shalin Shekhar Mangar
      30. SOLR-5473-74.patch
        71 kB
        Shalin Shekhar Mangar
      31. SOLR-5473-74.patch
        71 kB
        Shalin Shekhar Mangar
      32. SOLR-5473-74.patch
        74 kB
        Shalin Shekhar Mangar
      33. SOLR-5473-74.patch
        78 kB
        Shalin Shekhar Mangar
      34. SOLR-5473-74.patch
        76 kB
        Noble Paul
      35. SOLR-5473-74.patch
        80 kB
        Noble Paul
      36. SOLR-5473-74.patch
        79 kB
        Shalin Shekhar Mangar
      37. SOLR-5473-74.patch
        86 kB
        Shalin Shekhar Mangar
      38. SOLR-5473-74.patch
        91 kB
        Noble Paul
      39. SOLR-5473-74.patch
        92 kB
        Noble Paul
      40. SOLR-5473-74.patch
        96 kB
        Noble Paul
      41. SOLR-5473-74.patch
        97 kB
        Shalin Shekhar Mangar
      42. SOLR-5473-74.patch
        100 kB
        Shalin Shekhar Mangar
      43. SOLR-5473-74.patch
        95 kB
        Noble Paul
      44. SOLR-5473-74.patch
        19 kB
        Noble Paul
      45. SOLR-5473_undo.patch
        73 kB
        Noble Paul
      46. SOLR-5473-74.patch
        40 kB
        Noble Paul
      47. SOLR-5473-configname-fix.patch
        4 kB
        Shalin Shekhar Mangar
      48. SOLR-5473-74 .patch
        40 kB
        Noble Paul
      49. SOLR-5473-74.patch
        84 kB
        Noble Paul
      50. SOLR-5473-74.patch
        84 kB
        Noble Paul
      51. SOLR-5473-74.patch
        87 kB
        Noble Paul
      52. SOLR-5473-74.patch
        88 kB
        Noble Paul
      53. SOLR-5473-74.patch
        87 kB
        Noble Paul
      54. SOLR-5473-74_POC.patch
        27 kB
        Noble Paul
      55. SOLR-5473-74.patch
        83 kB
        Noble Paul
      56. SOLR-5473-74.patch
        83 kB
        Noble Paul
      57. SOLR-5473-74.patch
        83 kB
        Noble Paul
      58. SOLR-5473.patch
        82 kB
        Mark Miller
      59. SOLR-5473_no_ui.patch
        78 kB
        Timothy Potter
      60. SOLR-5473.patch
        81 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Noble Paul added a comment -

          Don't run the tests yet

          What does the patch do?

          • collection create accepts an extra parameter external=true/false . default=false.
          • If external=true. The collection will have no entry in the clusterstate.json
          • The clusterstate for all such 'external' collections will be stored at /collections/<collection-name>/state
          • DocCollection#isExternal() can tell you if it is external or not
          • Use DocCollection#getVersion() to get the zk version of the cluterstate of that object in ZK
          • A node will only listen to the collections where it i a member of
          Show
          Noble Paul added a comment - Don't run the tests yet What does the patch do? collection create accepts an extra parameter external=true/false . default=false. If external=true. The collection will have no entry in the clusterstate.json The clusterstate for all such 'external' collections will be stored at /collections/<collection-name>/state DocCollection#isExternal() can tell you if it is external or not Use DocCollection#getVersion() to get the zk version of the cluterstate of that object in ZK A node will only listen to the collections where it i a member of
          Hide
          Noble Paul added a comment -

          a couple of tests fail

          Show
          Noble Paul added a comment - a couple of tests fail
          Hide
          Mark Miller added a comment -

          if(debugState &&

          Best to do that with debug logging level rather than introduce a debug sys prop for this class.

          Show
          Mark Miller added a comment - if(debugState && Best to do that with debug logging level rather than introduce a debug sys prop for this class.
          Hide
          Timothy Potter added a comment -

          Thanks for fixing the CloudSolrServerTest failure ... One thing I wasn't sure about when looking over the latest patch was whether allCollections in ZkStateReader will hold the names of external collections? I assume so by the name all but it doesn't seem like any external collection names are added to that Set currently.

          Show
          Timothy Potter added a comment - Thanks for fixing the CloudSolrServerTest failure ... One thing I wasn't sure about when looking over the latest patch was whether allCollections in ZkStateReader will hold the names of external collections? I assume so by the name all but it doesn't seem like any external collection names are added to that Set currently.
          Hide
          Noble Paul added a comment -

          Timothy Potter
          The allCollections will store ALL collections. If you are looking at the trunk . There are no external collections in trunk yet. Please apply the patch and check

          Show
          Noble Paul added a comment - Timothy Potter The allCollections will store ALL collections. If you are looking at the trunk . There are no external collections in trunk yet. Please apply the patch and check
          Hide
          Noble Paul added a comment -

          if(debugState &&

          Thanks for the suggestion.However ,I added it for my dev testing will be removed before commit.

          Show
          Noble Paul added a comment - if(debugState && Thanks for the suggestion.However ,I added it for my dev testing will be removed before commit.
          Hide
          Noble Paul added a comment -

          Timothy Potter I have added the INVALID_STATE check required at serverside in this patch. you can just use it directly in your SolrJ code

          Show
          Noble Paul added a comment - Timothy Potter I have added the INVALID_STATE check required at serverside in this patch. you can just use it directly in your SolrJ code
          Hide
          Timothy Potter added a comment -

          Hi Noble,

          I stashed all my local changes and reverted to latest revision on trunk. Then I applied your patch from today and ran the CloudSolrServerTest. It is failing with:

          <testcase classname="org.apache.solr.client.solrj.impl.CloudSolrServerTest" name="testDistribSearch" time="54.197">
          <failure message="There are still nodes recoverying - waited for 30 seconds" type="java.lang.AssertionError">java.lang.AssertionError: There are still nodes recoverying - waited for 30 seconds
          at __randomizedtesting.SeedInfo.seed([3E324A3A3D5047B3:BFD4C4224A0F278F]:0)
          at org.junit.Assert.fail(Assert.java:93)
          at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:172)
          at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:707)
          at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForThingsToLevelOut(AbstractFullDistribZkTestBase.java:1492)
          at org.apache.solr.client.solrj.impl.CloudSolrServerTest.doTest(CloudSolrServerTest.java:108)
          at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:843)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)

          The test passes without the patch. Any ideas what might cause this? I'll keep moving forward on 5474, but wanted to let you know about this issue.

          Cheers,
          Tim

          Show
          Timothy Potter added a comment - Hi Noble, I stashed all my local changes and reverted to latest revision on trunk. Then I applied your patch from today and ran the CloudSolrServerTest. It is failing with: <testcase classname="org.apache.solr.client.solrj.impl.CloudSolrServerTest" name="testDistribSearch" time="54.197"> <failure message="There are still nodes recoverying - waited for 30 seconds" type="java.lang.AssertionError">java.lang.AssertionError: There are still nodes recoverying - waited for 30 seconds at __randomizedtesting.SeedInfo.seed( [3E324A3A3D5047B3:BFD4C4224A0F278F] :0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:172) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:707) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForThingsToLevelOut(AbstractFullDistribZkTestBase.java:1492) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.doTest(CloudSolrServerTest.java:108) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:843) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) The test passes without the patch. Any ideas what might cause this? I'll keep moving forward on 5474, but wanted to let you know about this issue. Cheers, Tim
          Hide
          Mark Miller added a comment -

          Thanks for the suggestion.However ,I added it for my dev testing will be removed before commit.

          For that stuff, I've found that if it's useful for development, it will be useful for someone else when they are debugging. I used to do the same thing, and every time I end up wasting a lot of time putting it back after I take it out, uncommenting it after I comment it out or whatever. If you just put it under debug logging, it's more powerful (you can control things very nicely with config) and it will help you and future developers in the long run.

          Show
          Mark Miller added a comment - Thanks for the suggestion.However ,I added it for my dev testing will be removed before commit. For that stuff, I've found that if it's useful for development, it will be useful for someone else when they are debugging. I used to do the same thing, and every time I end up wasting a lot of time putting it back after I take it out, uncommenting it after I comment it out or whatever. If you just put it under debug logging, it's more powerful (you can control things very nicely with config) and it will help you and future developers in the long run.
          Hide
          Mark Miller added a comment -

          When you are ready, it would be great to iterate on this in a branch. I could setup my jenkins to run jobs on it as well.

          Show
          Mark Miller added a comment - When you are ready, it would be great to iterate on this in a branch. I could setup my jenkins to run jobs on it as well.
          Hide
          Timothy Potter added a comment - - edited

          I had to comment out the changes to SolrDispatchFilter in the latest patch because calling req.getParameter(STATE_CHECK_PARAM) leads to issues when parsing the request:

          64053 T32 oasc.SolrException.log ERROR null:org.apache.solr.common.SolrException:
          Solr requires that request parameters sent using application/x-www-form-urlencoded content-type
          can be read through the request input stream. Unfortunately, the stream was empty / not available.
          This may be caused by another servlet filter calling ServletRequest.getParameter*() before SolrDispatchFilter,
          please remove it.
          at org.apache.solr.servlet.SolrRequestParsers$FormDataRequestParser.getParameterIncompatibilityException(SolrRequestParsers.java:626)
          at org.apache.solr.servlet.SolrRequestParsers$FormDataRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:612)
          at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:678)
          at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150)

          Commenting out the changes in SolrDispatchFilter allow the CloudSolrServerTest to pass locally.

          Show
          Timothy Potter added a comment - - edited I had to comment out the changes to SolrDispatchFilter in the latest patch because calling req.getParameter(STATE_CHECK_PARAM) leads to issues when parsing the request: 64053 T32 oasc.SolrException.log ERROR null:org.apache.solr.common.SolrException: Solr requires that request parameters sent using application/x-www-form-urlencoded content-type can be read through the request input stream. Unfortunately, the stream was empty / not available. This may be caused by another servlet filter calling ServletRequest.getParameter*() before SolrDispatchFilter, please remove it. at org.apache.solr.servlet.SolrRequestParsers$FormDataRequestParser.getParameterIncompatibilityException(SolrRequestParsers.java:626) at org.apache.solr.servlet.SolrRequestParsers$FormDataRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:612) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:678) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150) Commenting out the changes in SolrDispatchFilter allow the CloudSolrServerTest to pass locally.
          Hide
          Mark Miller added a comment -

          If that param is only expected to be sent via solrj, I solved this in the past by adding some methods to HttpSolrServer that lets you say that these params must be sent via GET params, which you can parse before passing on the request.

          Show
          Mark Miller added a comment - If that param is only expected to be sent via solrj, I solved this in the past by adding some methods to HttpSolrServer that lets you say that these params must be sent via GET params, which you can parse before passing on the request.
          Hide
          Noble Paul added a comment -

          Yeah we will have to move the query parsing before this is done.
          I'll move the code out to a method and keep it commented out for a while

          Show
          Noble Paul added a comment - Yeah we will have to move the query parsing before this is done. I'll move the code out to a method and keep it commented out for a while
          Hide
          Noble Paul added a comment -

          statecheck commented out .

          Show
          Noble Paul added a comment - statecheck commented out .
          Hide
          Noble Paul added a comment -

          most test cases pass now

          Show
          Noble Paul added a comment - most test cases pass now
          Hide
          Mark Miller added a comment -

          Perhaps time for a branch yet? Especially if Potter is doing stuff that interacts with this too, seems like it would make it easier for everyone to review / collaborate on these changes.

          Show
          Mark Miller added a comment - Perhaps time for a branch yet? Especially if Potter is doing stuff that interacts with this too, seems like it would make it easier for everyone to review / collaborate on these changes.
          Hide
          Noble Paul added a comment -

          tests are still failing. But running a cluster manually works quite well

          Show
          Noble Paul added a comment - tests are still failing. But running a cluster manually works quite well
          Hide
          Noble Paul added a comment -

          Where to branch? in SVN or a public git branch?

          Show
          Noble Paul added a comment - Where to branch? in SVN or a public git branch?
          Hide
          Shalin Shekhar Mangar added a comment -

          In ZkStateReader.createClusterStateWatchersAndUpdate, we have:

          ClusterState clusterState =  ZkStateReader.this.clusterState;
          clusterState.setLiveNodes(liveNodesSet);
          

          Is the ClusterState not immutable anymore?

          Show
          Shalin Shekhar Mangar added a comment - In ZkStateReader.createClusterStateWatchersAndUpdate, we have: ClusterState clusterState = ZkStateReader. this .clusterState; clusterState.setLiveNodes(liveNodesSet); Is the ClusterState not immutable anymore?
          Hide
          Noble Paul added a comment -

          Not immutable. It makes no sense anymore. Think about the clusterState#getCollection() call. If it is an external collection, the returned value would be different all the time.

          Show
          Noble Paul added a comment - Not immutable. It makes no sense anymore. Think about the clusterState#getCollection() call. If it is an external collection, the returned value would be different all the time.
          Hide
          Mark Miller added a comment -

          Where to branch? in SVN or a public git branch?

          Tough one...up to you I guess - I see pluses and minuses for both and don't have a strong opinion.

          Show
          Mark Miller added a comment - Where to branch? in SVN or a public git branch? Tough one...up to you I guess - I see pluses and minuses for both and don't have a strong opinion.
          Hide
          Noble Paul added a comment -

          I would go with github. I hope it's straight forward to setup jenkins with that

          Show
          Noble Paul added a comment - I would go with github. I hope it's straight forward to setup jenkins with that
          Hide
          Mark Miller added a comment -

          I hope it's straight forward to setup jenkins with that

          Yup - I use GitHub almost exclusively outside apache.

          Show
          Mark Miller added a comment - I hope it's straight forward to setup jenkins with that Yup - I use GitHub almost exclusively outside apache.
          Hide
          Noble Paul added a comment -

          I could get all tests to pass.

          Show
          Noble Paul added a comment - I could get all tests to pass.
          Hide
          Noble Paul added a comment -

          The changes are updated to github at https://github.com/shalinmangar/lucene-solr

          Please take a look at the code and pass your review comments

          Show
          Noble Paul added a comment - The changes are updated to github at https://github.com/shalinmangar/lucene-solr Please take a look at the code and pass your review comments
          Hide
          Noble Paul added a comment -

          got rid of ref counting at ZkStateReader for watching external collections to realtime check of cores->collections to decide if a watch is required

          Show
          Noble Paul added a comment - got rid of ref counting at ZkStateReader for watching external collections to realtime check of cores->collections to decide if a watch is required
          Hide
          Noble Paul added a comment -

          Added a testcase

          Show
          Noble Paul added a comment - Added a testcase
          Hide
          Noble Paul added a comment -

          updated to latest trunk

          Show
          Noble Paul added a comment - updated to latest trunk
          Hide
          Mark Miller added a comment -

          Hmm...it says it's at github, but then you put up a patch?

          Is the github branch up to date?

          Show
          Mark Miller added a comment - Hmm...it says it's at github, but then you put up a patch? Is the github branch up to date?
          Hide
          Mark Miller added a comment -

          Also, what branch is it on?

          Show
          Mark Miller added a comment - Also, what branch is it on?
          Hide
          Timothy Potter added a comment -

          I'm not sure if this github is accessible to you but I know Noble and Shalin have been working on this in the solr-5381 branch at https://github.com/shalinmangar/lucene-solr. The patch I posted for 5474 is based on this branch too.

          Show
          Timothy Potter added a comment - I'm not sure if this github is accessible to you but I know Noble and Shalin have been working on this in the solr-5381 branch at https://github.com/shalinmangar/lucene-solr . The patch I posted for 5474 is based on this branch too.
          Hide
          Noble Paul added a comment -

          updated to trunk

          Show
          Noble Paul added a comment - updated to trunk
          Hide
          Shalin Shekhar Mangar added a comment -

          Some comments on the latest patch:

          1. AbstractFullDistribZkTestBase has a useExternalCollection() which is hard coded to false. Why? Can we randomize using external collections in the base test to have better test coverage?
          2. ClusterState.getCollections has a todo which says “fix later JUnit is failing”. Which test is failing?
          3. What is stateVer used for? I guess it is for SOLR-5474 and not this issue?
          4. This patch has only whitespace related changes to CloudSolrServer.
          5. There is wrong formatting and incorrect spacing in the new code such as Overseer.createCollection, new methods in ClusterState etc. You should re-format all the new/modified code blocks
          6. There was one forbidden-api check failure where new String(byte[]) constructor is used in a log message. Run ant check-forbidden-apis from inside the solr directory.
          7. There are three javadoc errors (run ant precommit):
            [ecj-lint] 1. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java (at line 199)
             [ecj-lint] 	/** @deprecated
             [ecj-lint] 	     ^^^^^^^^^^
             [ecj-lint] Javadoc: Description expected after @deprecated
             [ecj-lint] ----------
             [ecj-lint] 2. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java (at line 297)
             [ecj-lint] 	* @deprecated
             [ecj-lint] 	   ^^^^^^^^^^
             [ecj-lint] Javadoc: Description expected after @deprecated
             [ecj-lint] ----------
             [ecj-lint] ----------
             [ecj-lint] 3. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java (at line 759)
             [ecj-lint] 	* @param coll
             [ecj-lint] 	         ^^^^
             [ecj-lint] Javadoc: Description expected after this reference
             [ecj-lint] ----------
             [ecj-lint] 3 problems (3 errors)
            
          Show
          Shalin Shekhar Mangar added a comment - Some comments on the latest patch: AbstractFullDistribZkTestBase has a useExternalCollection() which is hard coded to false. Why? Can we randomize using external collections in the base test to have better test coverage? ClusterState.getCollections has a todo which says “fix later JUnit is failing”. Which test is failing? What is stateVer used for? I guess it is for SOLR-5474 and not this issue? This patch has only whitespace related changes to CloudSolrServer. There is wrong formatting and incorrect spacing in the new code such as Overseer.createCollection, new methods in ClusterState etc. You should re-format all the new/modified code blocks There was one forbidden-api check failure where new String(byte[]) constructor is used in a log message. Run ant check-forbidden-apis from inside the solr directory. There are three javadoc errors (run ant precommit): [ecj-lint] 1. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java (at line 199) [ecj-lint] /** @deprecated [ecj-lint] ^^^^^^^^^^ [ecj-lint] Javadoc: Description expected after @deprecated [ecj-lint] ---------- [ecj-lint] 2. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java (at line 297) [ecj-lint] * @deprecated [ecj-lint] ^^^^^^^^^^ [ecj-lint] Javadoc: Description expected after @deprecated [ecj-lint] ---------- [ecj-lint] ---------- [ecj-lint] 3. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java (at line 759) [ecj-lint] * @param coll [ecj-lint] ^^^^ [ecj-lint] Javadoc: Description expected after this reference [ecj-lint] ---------- [ecj-lint] 3 problems (3 errors)
          Hide
          Mark Miller added a comment -

          I'm fairly busy in the short term - going out of town for a few days. But I intend to review this as well.

          Show
          Mark Miller added a comment - I'm fairly busy in the short term - going out of town for a few days. But I intend to review this as well.
          Hide
          Noble Paul added a comment -

          What is stateVer used for?

          This is a part of SOLR-5474 this issue will be committed along with that.

          Show
          Noble Paul added a comment - What is stateVer used for? This is a part of SOLR-5474 this issue will be committed along with that.
          Hide
          Noble Paul added a comment -

          Added SOLR-5474 also to the patch.

          deprecated API calls removed

          TODO
          javadoc errors yet to be fixed

          More tests using external collections

          Show
          Noble Paul added a comment - Added SOLR-5474 also to the patch. deprecated API calls removed TODO javadoc errors yet to be fixed More tests using external collections
          Hide
          Shalin Shekhar Mangar added a comment -

          Two comments specific to the SOLR-5474 part which has been included in the latest patch:

          1. The ExpireOnGetHashMap is not thread-safe. The eviction logic in ExpireOnGetHashMap synchronises on ‘this’ but that is insufficient. The requestWithRetryOnStaleState method calls put and remove without synchronizing on the map so it is open to a ConcurrentModificationException. I think we can just extend ConcurrentHashMap instead and not worry about synchronisation at all.
          2. Now that we have eliminated LazyCloudSolrServer, the method signature changes from private to protected are no longer necessary.
          Show
          Shalin Shekhar Mangar added a comment - Two comments specific to the SOLR-5474 part which has been included in the latest patch: The ExpireOnGetHashMap is not thread-safe. The eviction logic in ExpireOnGetHashMap synchronises on ‘this’ but that is insufficient. The requestWithRetryOnStaleState method calls put and remove without synchronizing on the map so it is open to a ConcurrentModificationException. I think we can just extend ConcurrentHashMap instead and not worry about synchronisation at all. Now that we have eliminated LazyCloudSolrServer, the method signature changes from private to protected are no longer necessary.
          Hide
          Noble Paul added a comment -

          fixing concurrency issue in the last patch in CloudSolrServer

          Show
          Noble Paul added a comment - fixing concurrency issue in the last patch in CloudSolrServer
          Hide
          Noble Paul added a comment -

          fixed a small bug

          Show
          Noble Paul added a comment - fixed a small bug
          Hide
          Timothy Potter added a comment -

          I'm seeing a race condition when creating new external collections where the leader=true flag is getting lost on some of the shards. I'm calling this a race condition because it does not happen all the time, but is fairly consistent in my cluster. I'm trying to create an external collection with 4 shards and RF=3 in a 4-node cluster using this patch, e.g.

          http://HOST:PORT/solr/admin/collections?action=CREATE&name=tj4&replicationFactor=3&numShards=4&collection.configName=cloud&maxShardsPerNode=3&external=true

          After running this, the collection ended up being invalid because shard2 does not have a leader.

          Here are some interesting log messages from a few of the servers involved in the leader election of shard2.

          btw - those nodes are dead now, so I didn't bother obfuscating hostnames

          >>>> ec2-50-16-38-73 NODE THAT THINKS IT IS THE LEADER <<<<

          2014-03-03 20:26:17,414 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - I may be the new leader - try and sync
          2014-03-03 20:26:17,414 [qtp336293446-23] INFO solr.cloud.SyncStrategy - Sync replicas to http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/
          2014-03-03 20:26:17,414 [qtp336293446-23] INFO solr.update.PeerSync - PeerSync: core=tj4_shard2_replica3 url=http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr START replicas=http://ec2-23-20-119-52.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica2/, http://ec2-54-237-76-40.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica1/ nUpdates=100
          2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.update.PeerSync - PeerSync: core=tj4_shard2_replica3 url=http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr DONE. We have no versions. sync failed.
          2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.cloud.SyncStrategy - Leader's attempt to sync with shard failed, moving to the next candidate
          2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway
          2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - I am the new leader: http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ shard2
          2014-03-03 20:26:17,415 [qtp336293446-23] INFO common.cloud.SolrZkClient - makePath: /collections/tj4/leaders/shard2
          2014-03-03 20:26:17,423 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContextBase - For shard2, offering ZkNodeProps JSON:

          { "operation":"leader", "shard":"shard2", "collection":"tj4", "base_url":"http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica3", "state":"active"}

          ; current queue size is: 1
          2014-03-03 20:26:17,426 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - runLeaderProcess(false) succeeded for shard2
          2014-03-03 20:26:17,441 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tj4/state via watch on path /collections/tj4/state, has occurred - updating...
          2014-03-03 20:26:17,444 [main-EventThread] INFO common.cloud.ZkStateReader - Updating data for tj4
          2014-03-03 20:26:17,444 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
          2014-03-03 20:26:17,469 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tj4/state via watch on path /collections/tj4/state, has occurred - updating...
          2014-03-03 20:26:17,471 [main-EventThread] INFO common.cloud.ZkStateReader - Updating data for tj4
          2014-03-03 20:26:17,471 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
          2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - We are http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ and leader is http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/
          2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - No LogReplay needed for core=tj4_shard2_replica3 baseURL=http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr
          2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - I am the leader, no recovery necessary
          2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - publishing core=tj4_shard2_replica3 state=active

          >>>>> OVER ON THE OVERSEER NODE ec2-23-20-119-52 <<<<<<

          2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Processing leader operation for shard2; coreName=tj4_shard2_replica3
          2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Applying leader change for shard2 for tj4; leaderUrl=http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/
          2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Set replica core_node7 as the leader for shard shard2 for collection tj4; leaderUrl=http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/
          2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - updateNodes: put /collections/tj4/state
          2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Created new ClusterState for shard2 for collection tj4: live nodes:[ec2-54-198-254-111.compute-1.amazonaws.com:8984_solr, ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr, ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr, ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr] collections:{}
          2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - After processsing leader, clusterState: live nodes:[ec2-54-198-254-111.compute-1.amazonaws.com:8984_solr, ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr, ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr, ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr] collections:{}
          2014-03-03 20:26:17,013 [Thread-21] INFO solr.cloud.Overseer - updateNodes not empty: 1
          2014-03-03 20:26:17,014 [Thread-21] INFO solr.cloud.Overseer - going to update_collection live nodes:[] collections:{tj4=DocCollection(tj4)={
          "shards":{
          "shard1":{
          ....
          "shard2":{
          "range":"c0000000-ffffffff",
          "state":"active",
          "replicas":{
          "core_node1":

          { "state":"down", "base_url":"http://ec2-23-20-119-52.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica2", "node_name":"ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr"}

          ,
          "core_node5":

          { "state":"down", "base_url":"http://ec2-54-237-76-40.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica1", "node_name":"ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr"}

          ,
          "core_node7":

          { "state":"down", "base_url":"http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica3", "node_name":"ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr", "leader":"true"}

          }}, <====== SHARD #2's LEADER!!!

          NOTE: JSON trimmed for brevity

          Notice that the correct shard is marked as the leader in the state.json znode for this collection, this is goodness and expected

          However, sometime very soon after, which you can see in the attached log ec2-23-20-119-52_solr.log around lines 28577 - 28637, a new state is published that has now lost the leader property for shard2.

          "shard2":{
          "range":"c0000000-ffffffff",
          "state":"active",
          "replicas":{
          "core_node1":

          { "state":"down", "base_url":"http://ec2-23-20-119-52.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica2", "node_name":"ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr"}

          ,
          "core_node5":

          { "state":"down", "base_url":"http://ec2-54-237-76-40.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica1", "node_name":"ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr"}

          ,
          "core_node7":

          { "state":"down", "base_url":"http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica3", "node_name":"ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr"}

          }},
          <====== NO LONGER THE LEADER ?!?

          See attached logs for these two servers.

          Looks to me like somehow, depending on timing of status updates to collection state, the leader flag on one or more of the shards can get lost. Again, this doesn't happen all the time but does happen fairly often in a cluster that is networked. I haven't been able to get it to happen when running 4-nodes on my workstation.

          Show
          Timothy Potter added a comment - I'm seeing a race condition when creating new external collections where the leader=true flag is getting lost on some of the shards. I'm calling this a race condition because it does not happen all the time, but is fairly consistent in my cluster. I'm trying to create an external collection with 4 shards and RF=3 in a 4-node cluster using this patch, e.g. http://HOST:PORT/solr/admin/collections?action=CREATE&name=tj4&replicationFactor=3&numShards=4&collection.configName=cloud&maxShardsPerNode=3&external=true After running this, the collection ended up being invalid because shard2 does not have a leader. Here are some interesting log messages from a few of the servers involved in the leader election of shard2. btw - those nodes are dead now, so I didn't bother obfuscating hostnames >>>> ec2-50-16-38-73 NODE THAT THINKS IT IS THE LEADER <<<< 2014-03-03 20:26:17,414 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - I may be the new leader - try and sync 2014-03-03 20:26:17,414 [qtp336293446-23] INFO solr.cloud.SyncStrategy - Sync replicas to http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ 2014-03-03 20:26:17,414 [qtp336293446-23] INFO solr.update.PeerSync - PeerSync: core=tj4_shard2_replica3 url= http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr START replicas= http://ec2-23-20-119-52.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica2/, http://ec2-54-237-76-40.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica1/ nUpdates=100 2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.update.PeerSync - PeerSync: core=tj4_shard2_replica3 url= http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr DONE. We have no versions. sync failed. 2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.cloud.SyncStrategy - Leader's attempt to sync with shard failed, moving to the next candidate 2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway 2014-03-03 20:26:17,415 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - I am the new leader: http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ shard2 2014-03-03 20:26:17,415 [qtp336293446-23] INFO common.cloud.SolrZkClient - makePath: /collections/tj4/leaders/shard2 2014-03-03 20:26:17,423 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContextBase - For shard2, offering ZkNodeProps JSON: { "operation":"leader", "shard":"shard2", "collection":"tj4", "base_url":"http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica3", "state":"active"} ; current queue size is: 1 2014-03-03 20:26:17,426 [qtp336293446-23] INFO solr.cloud.ShardLeaderElectionContext - runLeaderProcess(false) succeeded for shard2 2014-03-03 20:26:17,441 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tj4/state via watch on path /collections/tj4/state, has occurred - updating... 2014-03-03 20:26:17,444 [main-EventThread] INFO common.cloud.ZkStateReader - Updating data for tj4 2014-03-03 20:26:17,444 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) 2014-03-03 20:26:17,469 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tj4/state via watch on path /collections/tj4/state, has occurred - updating... 2014-03-03 20:26:17,471 [main-EventThread] INFO common.cloud.ZkStateReader - Updating data for tj4 2014-03-03 20:26:17,471 [main-EventThread] INFO common.cloud.ZkStateReader - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) 2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - We are http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ and leader is http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ 2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - No LogReplay needed for core=tj4_shard2_replica3 baseURL= http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr 2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - I am the leader, no recovery necessary 2014-03-03 20:26:17,476 [qtp336293446-23] INFO solr.cloud.ZkController - publishing core=tj4_shard2_replica3 state=active >>>>> OVER ON THE OVERSEER NODE ec2-23-20-119-52 <<<<<< 2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Processing leader operation for shard2; coreName=tj4_shard2_replica3 2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Applying leader change for shard2 for tj4; leaderUrl= http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ 2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Set replica core_node7 as the leader for shard shard2 for collection tj4; leaderUrl= http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr/tj4_shard2_replica3/ 2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - updateNodes: put /collections/tj4/state 2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - Created new ClusterState for shard2 for collection tj4: live nodes: [ec2-54-198-254-111.compute-1.amazonaws.com:8984_solr, ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr, ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr, ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr] collections:{} 2014-03-03 20:26:17,008 [Thread-21] INFO solr.cloud.Overseer - After processsing leader, clusterState: live nodes: [ec2-54-198-254-111.compute-1.amazonaws.com:8984_solr, ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr, ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr, ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr] collections:{} 2014-03-03 20:26:17,013 [Thread-21] INFO solr.cloud.Overseer - updateNodes not empty: 1 2014-03-03 20:26:17,014 [Thread-21] INFO solr.cloud.Overseer - going to update_collection live nodes:[] collections:{tj4=DocCollection(tj4)={ "shards":{ "shard1":{ .... "shard2":{ "range":"c0000000-ffffffff", "state":"active", "replicas":{ "core_node1": { "state":"down", "base_url":"http://ec2-23-20-119-52.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica2", "node_name":"ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr"} , "core_node5": { "state":"down", "base_url":"http://ec2-54-237-76-40.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica1", "node_name":"ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr"} , "core_node7": { "state":"down", "base_url":"http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica3", "node_name":"ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr", "leader":"true"} }}, <====== SHARD #2's LEADER!!! NOTE: JSON trimmed for brevity Notice that the correct shard is marked as the leader in the state.json znode for this collection, this is goodness and expected However, sometime very soon after, which you can see in the attached log ec2-23-20-119-52_solr.log around lines 28577 - 28637, a new state is published that has now lost the leader property for shard2. "shard2":{ "range":"c0000000-ffffffff", "state":"active", "replicas":{ "core_node1": { "state":"down", "base_url":"http://ec2-23-20-119-52.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica2", "node_name":"ec2-23-20-119-52.compute-1.amazonaws.com:8984_solr"} , "core_node5": { "state":"down", "base_url":"http://ec2-54-237-76-40.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica1", "node_name":"ec2-54-237-76-40.compute-1.amazonaws.com:8984_solr"} , "core_node7": { "state":"down", "base_url":"http://ec2-50-16-38-73.compute-1.amazonaws.com:8984/solr", "core":"tj4_shard2_replica3", "node_name":"ec2-50-16-38-73.compute-1.amazonaws.com:8984_solr"} }}, <====== NO LONGER THE LEADER ?!? See attached logs for these two servers. Looks to me like somehow, depending on timing of status updates to collection state, the leader flag on one or more of the shards can get lost. Again, this doesn't happen all the time but does happen fairly often in a cluster that is networked. I haven't been able to get it to happen when running 4-nodes on my workstation.
          Hide
          Noble Paul added a comment -

          I have updated the patch w/ trunk.

          CollectionsAPIDistributedZkTest randomly use external Collections in the tests

          Show
          Noble Paul added a comment - I have updated the patch w/ trunk. CollectionsAPIDistributedZkTest randomly use external Collections in the tests
          Hide
          Jessica Cheng Mallet added a comment -

          In Overseer.run, the "process the workqueue items leftover by the last Overseer" code seems unimplemented for external collection.

          On roughly line 135-136:
          clusterState = processMessage(clusterState, message, operation);
          zkClient.setData(ZkStateReader.CLUSTER_STATE, ZkStateReader.toJSON(clusterState), true);

          It always just updates clusterstate.json and never cares about the external state.json, instead of going through the same logic as below for checking !updateNodes.isEmpty() and isClusterStateModified.

          Show
          Jessica Cheng Mallet added a comment - In Overseer.run, the "process the workqueue items leftover by the last Overseer" code seems unimplemented for external collection. On roughly line 135-136: clusterState = processMessage(clusterState, message, operation); zkClient.setData(ZkStateReader.CLUSTER_STATE, ZkStateReader.toJSON(clusterState), true); It always just updates clusterstate.json and never cares about the external state.json, instead of going through the same logic as below for checking !updateNodes.isEmpty() and isClusterStateModified.
          Hide
          Noble Paul added a comment -

          good catch Jessica Cheng Mallet . here is the fix

          Show
          Noble Paul added a comment - good catch Jessica Cheng Mallet . here is the fix
          Hide
          Noble Paul added a comment -

          Catching up with the ever changing trunk.

          Show
          Noble Paul added a comment - Catching up with the ever changing trunk.
          Hide
          Noble Paul added a comment -

          The last patch missed the testcase

          Show
          Noble Paul added a comment - The last patch missed the testcase
          Hide
          Noble Paul added a comment -

          cloudsolrservertest using external collections

          Show
          Noble Paul added a comment - cloudsolrservertest using external collections
          Hide
          Noble Paul added a comment -

          Updated with trunk after SOLR-5477 commit

          Show
          Noble Paul added a comment - Updated with trunk after SOLR-5477 commit
          Hide
          Shalin Shekhar Mangar added a comment -

          I think we should add the external flag in the SplitShardTest and MigrateRouteKeyTest as well to get more coverage.

          Show
          Shalin Shekhar Mangar added a comment - I think we should add the external flag in the SplitShardTest and MigrateRouteKeyTest as well to get more coverage.
          Hide
          Anshum Gupta added a comment -

          Adding to that, testing this for ASYNC calls would be good.

          Show
          Anshum Gupta added a comment - Adding to that, testing this for ASYNC calls would be good.
          Hide
          Anshum Gupta added a comment -

          Can you also remove the redundant typing (diamond operator) from the patch.
          You (or your IDE) seem to have added them. We should avoid adding those, specially after LUCENE-5512 went in.

          Show
          Anshum Gupta added a comment - Can you also remove the redundant typing (diamond operator) from the patch. You (or your IDE) seem to have added them. We should avoid adding those, specially after LUCENE-5512 went in.
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk.

          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk.
          Hide
          Anshum Gupta added a comment -

          More comments:

          1. Removing the commented out code would be good:
            ZkStateReader.java
             @@ -283,6 +298,12 @@
            
                           ClusterState clusterState = ClusterState.load(stat.getVersion(), data, ln,ZkStateReader.this);
                           // update volatile
                           ZkStateReader.this.clusterState = clusterState;
            +
            +              updateCollectionNames();
            +//              HashSet<String> all = new HashSet<>(colls);;
            +//              all.addAll(clusterState.getAllInternalCollections());
            +//              all.remove(null);
            +
            
          2. The following code reverts changes from SOLR-5770
            ClusterState.json
            @@ -198,16 +210,28 @@
            -  public String getShardId(String nodeName, String coreName) {
            +
             public String getShardId(String collectionName, String baseUrl, String coreName) {
            
          3. Can we change the new method signature from “DocCollection(version,name, slices, props, router)” to “DocCollection(name, slices, props, router, version)” ? I understand it’s more superficial but it looks better to have extended method signature grow to the right.
          4. Can you put the “// clean work queue” comment back in the code in Overseer.java?
          5. The indentation (primarily spaces) seems off at a few places. Would be good to fix that.
          6. Seems like this would not work for extern collections. Do you plan to handle this before you commit?
            Assign.java
            if(c == null) continue; //TODO check this later
            
          Show
          Anshum Gupta added a comment - More comments: Removing the commented out code would be good: ZkStateReader.java @@ -283,6 +298,12 @@ ClusterState clusterState = ClusterState.load(stat.getVersion(), data, ln,ZkStateReader. this ); // update volatile ZkStateReader. this .clusterState = clusterState; + + updateCollectionNames(); + // HashSet< String > all = new HashSet<>(colls);; + // all.addAll(clusterState.getAllInternalCollections()); + // all.remove( null ); + The following code reverts changes from SOLR-5770 ClusterState.json @@ -198,16 +210,28 @@ - public String getShardId( String nodeName, String coreName) { + public String getShardId( String collectionName, String baseUrl, String coreName) { Can we change the new method signature from “DocCollection(version,name, slices, props, router)” to “DocCollection(name, slices, props, router, version)” ? I understand it’s more superficial but it looks better to have extended method signature grow to the right. Can you put the “// clean work queue” comment back in the code in Overseer.java? The indentation (primarily spaces) seems off at a few places. Would be good to fix that. Seems like this would not work for extern collections. Do you plan to handle this before you commit? Assign.java if (c == null ) continue ; //TODO check this later
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Anshum.

          I've fixed all of your points except for indentation. I'm working on adding some tests and I'll take care of it before the next patch.

          Next to come:

          1. Randomize AbstractFullDistribZkTestBase.useExternalCollection so that all cloud tests use external collection sometimes.
          2. Write a test which exercises SolrDispatchFilter logic with missing and invalid _stateVer_ values.
          Show
          Shalin Shekhar Mangar added a comment - Thanks Anshum. I've fixed all of your points except for indentation. I'm working on adding some tests and I'll take care of it before the next patch. Next to come: Randomize AbstractFullDistribZkTestBase.useExternalCollection so that all cloud tests use external collection sometimes. Write a test which exercises SolrDispatchFilter logic with missing and invalid _stateVer_ values.
          Hide
          Shalin Shekhar Mangar added a comment -

          The previous patch missed reverting baseUrl to nodeName in ZkController.getShardId.

          I'm still seeing some test failures. For example in OverseerStatusTest, it looks like the OverseerCollectionProcessor is waiting for a new collection to be created but Overseer is waiting for new items in the queue?

          "Overseer-91493854598660099-127.0.0.1:52655_o_%2Fcs-n_0000000000" daemon prio=10 tid=0x00007fb7a4299000 nid=0x32be waiting on condition [0x00007fb810168000]
          java.lang.Thread.State: TIMED_WAITING (sleeping)
          at java.lang.Thread.sleep(Native Method)
          at org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:1999)
          at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:449)
          at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:242)
          at java.lang.Thread.run(Thread.java:744)

          "Thread-12" daemon prio=10 tid=0x00007fb7a4297800 nid=0x32bd in Object.wait() [0x00007fb810269000]
          java.lang.Thread.State: TIMED_WAITING (on object monitor)
          at java.lang.Object.wait(Native Method)

          • waiting on <0x00000007783049c8> (a java.lang.Object)
            at org.apache.solr.cloud.DistributedQueue$LatchChildWatcher.await(DistributedQueue.java:238)
          • locked <0x00000007783049c8> (a java.lang.Object)
            at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:464)
            at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:428)
            at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:217)
            at java.lang.Thread.run(Thread.java:744)

          Digging into it now.

          Show
          Shalin Shekhar Mangar added a comment - The previous patch missed reverting baseUrl to nodeName in ZkController.getShardId. I'm still seeing some test failures. For example in OverseerStatusTest, it looks like the OverseerCollectionProcessor is waiting for a new collection to be created but Overseer is waiting for new items in the queue? "Overseer-91493854598660099-127.0.0.1:52655_o_%2Fcs-n_0000000000" daemon prio=10 tid=0x00007fb7a4299000 nid=0x32be waiting on condition [0x00007fb810168000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:1999) at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:449) at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:242) at java.lang.Thread.run(Thread.java:744) "Thread-12" daemon prio=10 tid=0x00007fb7a4297800 nid=0x32bd in Object.wait() [0x00007fb810269000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x00000007783049c8> (a java.lang.Object) at org.apache.solr.cloud.DistributedQueue$LatchChildWatcher.await(DistributedQueue.java:238) locked <0x00000007783049c8> (a java.lang.Object) at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:464) at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:428) at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:217) at java.lang.Thread.run(Thread.java:744) Digging into it now.
          Hide
          Shalin Shekhar Mangar added a comment -

          My earlier patches had left out handling of external collections from the createCollection method in Overseer which was causing the test hangs. This is fixed in this patch.

          Show
          Shalin Shekhar Mangar added a comment - My earlier patches had left out handling of external collections from the createCollection method in Overseer which was causing the test hangs. This is fixed in this patch.
          Hide
          Shalin Shekhar Mangar added a comment -

          I ran into a wierd error on one of my test runs:

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=CollectionsAPIDistributedZkTest -Dtests.method=testDistribSearch -Dtests.seed=3E56FB4CCC2F1223 -Dtests.slow=true -Dtests.locale=fr_FR -Dtests.timezone=Asia/Hovd -Dtests.file.encoding=ISO-8859-1
             [junit4] ERROR   86.7s J0 | CollectionsAPIDistributedZkTest.testDistribSearch <<<
             [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1767, name=RecoveryThread, state=RUNNABLE, group=TGRP-CollectionsAPIDistributedZkTest]
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([3E56FB4CCC2F1223:BFB07554BB70721F]:0)
             [junit4]    > Caused by: java.lang.StackOverflowError
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([3E56FB4CCC2F1223]:0)
             [junit4]    > 	at org.apache.log4j.spi.LoggingEvent.getLocationInformation(LoggingEvent.java:253)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:124)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110)
             [junit4]    > 	at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
             [junit4]    > 	at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
             [junit4]    > 	at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
             [junit4]    > 	at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
             [junit4]    > 	at org.apache.log4j.Category.callAppenders(Category.java:206)
             [junit4]    > 	at org.apache.log4j.Category.forcedLog(Category.java:391)
             [junit4]    > 	at org.apache.log4j.Category.log(Category.java:856)
             [junit4]    > 	at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:400)
             [junit4]    > 	at org.apache.solr.common.cloud.ZkStateReader.getExternCollection(ZkStateReader.java:722)
             [junit4]    > 	at org.apache.solr.common.cloud.ClusterState.loadExtDocCollection(ClusterState.java:178)
             [junit4]    > 	at org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:185)
             [junit4]    > 	at org.apache.solr.common.cloud.ClusterState.getReplica(ClusterState.java:118)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout.getReplicaProps(SolrLogLayout.java:234)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:164)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110)
             [junit4]    > 	at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
             [junit4]    > 	at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
             [junit4]    > 	at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
             [junit4]    > 	at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
             [junit4]    > 	at org.apache.log4j.Category.callAppenders(Category.java:206)
             [junit4]    > 	at org.apache.log4j.Category.forcedLog(Category.java:391)
             [junit4]    > 	at org.apache.log4j.Category.log(Category.java:856)
             [junit4]    > 	at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:400)
             [junit4]    > 	at org.apache.solr.common.cloud.ZkStateReader.getExternCollection(ZkStateReader.java:722)
             [junit4]    > 	at org.apache.solr.common.cloud.ClusterState.loadExtDocCollection(ClusterState.java:178)
             [junit4]    > 	at org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:185)
             [junit4]    > 	at org.apache.solr.common.cloud.ClusterState.getReplica(ClusterState.java:118)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout.getReplicaProps(SolrLogLayout.java:234)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:164)
             [junit4]    > 	at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110)
             [junit4]    > 	at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
          

          It seems that logger is trying to read a replica property which doesn't exist and therefore calls getExternCollections which also fails and tries to log a warning which causes an infinite loop and a stack overflow.

          Show
          Shalin Shekhar Mangar added a comment - I ran into a wierd error on one of my test runs: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=CollectionsAPIDistributedZkTest -Dtests.method=testDistribSearch -Dtests.seed=3E56FB4CCC2F1223 -Dtests.slow= true -Dtests.locale=fr_FR -Dtests.timezone=Asia/Hovd -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 86.7s J0 | CollectionsAPIDistributedZkTest.testDistribSearch <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread [id=1767, name=RecoveryThread, state=RUNNABLE, group=TGRP-CollectionsAPIDistributedZkTest] [junit4] > at __randomizedtesting.SeedInfo.seed([3E56FB4CCC2F1223:BFB07554BB70721F]:0) [junit4] > Caused by: java.lang.StackOverflowError [junit4] > at __randomizedtesting.SeedInfo.seed([3E56FB4CCC2F1223]:0) [junit4] > at org.apache.log4j.spi.LoggingEvent.getLocationInformation(LoggingEvent.java:253) [junit4] > at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:124) [junit4] > at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110) [junit4] > at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310) [junit4] > at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) [junit4] > at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) [junit4] > at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) [junit4] > at org.apache.log4j.Category.callAppenders(Category.java:206) [junit4] > at org.apache.log4j.Category.forcedLog(Category.java:391) [junit4] > at org.apache.log4j.Category.log(Category.java:856) [junit4] > at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:400) [junit4] > at org.apache.solr.common.cloud.ZkStateReader.getExternCollection(ZkStateReader.java:722) [junit4] > at org.apache.solr.common.cloud.ClusterState.loadExtDocCollection(ClusterState.java:178) [junit4] > at org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:185) [junit4] > at org.apache.solr.common.cloud.ClusterState.getReplica(ClusterState.java:118) [junit4] > at org.apache.solr.util.SolrLogLayout.getReplicaProps(SolrLogLayout.java:234) [junit4] > at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:164) [junit4] > at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110) [junit4] > at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310) [junit4] > at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) [junit4] > at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) [junit4] > at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) [junit4] > at org.apache.log4j.Category.callAppenders(Category.java:206) [junit4] > at org.apache.log4j.Category.forcedLog(Category.java:391) [junit4] > at org.apache.log4j.Category.log(Category.java:856) [junit4] > at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:400) [junit4] > at org.apache.solr.common.cloud.ZkStateReader.getExternCollection(ZkStateReader.java:722) [junit4] > at org.apache.solr.common.cloud.ClusterState.loadExtDocCollection(ClusterState.java:178) [junit4] > at org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:185) [junit4] > at org.apache.solr.common.cloud.ClusterState.getReplica(ClusterState.java:118) [junit4] > at org.apache.solr.util.SolrLogLayout.getReplicaProps(SolrLogLayout.java:234) [junit4] > at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:164) [junit4] > at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110) [junit4] > at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310) It seems that logger is trying to read a replica property which doesn't exist and therefore calls getExternCollections which also fails and tries to log a warning which causes an infinite loop and a stack overflow.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          All solr core tests pass with this patch. After discussing offline with Noble, I introduced a new method ClusterState.getCachedReplica which is exactly like getReplica except that it will fetch information available in locally cached data and never hit ZK. The older getReplica and this new getCachedReplica method are used only by SolrLogLayout and SolrLogFormatter classes so these should never hit ZK anyway.

          However, there is a SolrJ test failure in CloudSolrServerTest on asserts added by SOLR-5715

            [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=CloudSolrServerTest -Dtests.method=testDistribSearch -Dtests.seed=5FAC2B1757C387B3 -Dtests.slow=true -Dtests.locale=sv_SE -Dtests.timezone=Pacific/Samoa -Dtests.file.encoding=US-ASCII
             [junit4] FAILURE 22.1s J1 | CloudSolrServerTest.testDistribSearch <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: Unexpected number of requests to expected URLs expected:<6> but was:<0>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([5FAC2B1757C387B3:DE4AA50F209CE78F]:0)
             [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrServerTest.doTest(CloudSolrServerTest.java:300)
             [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:867)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:744)
             [junit4]   2> 26918 T10 oas.SolrTestCaseJ4.deleteCore ###deleteCore
          
          Show
          Shalin Shekhar Mangar added a comment - - edited All solr core tests pass with this patch. After discussing offline with Noble, I introduced a new method ClusterState.getCachedReplica which is exactly like getReplica except that it will fetch information available in locally cached data and never hit ZK. The older getReplica and this new getCachedReplica method are used only by SolrLogLayout and SolrLogFormatter classes so these should never hit ZK anyway. However, there is a SolrJ test failure in CloudSolrServerTest on asserts added by SOLR-5715 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=CloudSolrServerTest -Dtests.method=testDistribSearch -Dtests.seed=5FAC2B1757C387B3 -Dtests.slow= true -Dtests.locale=sv_SE -Dtests.timezone=Pacific/Samoa -Dtests.file.encoding=US-ASCII [junit4] FAILURE 22.1s J1 | CloudSolrServerTest.testDistribSearch <<< [junit4] > Throwable #1: java.lang.AssertionError: Unexpected number of requests to expected URLs expected:<6> but was:<0> [junit4] > at __randomizedtesting.SeedInfo.seed([5FAC2B1757C387B3:DE4AA50F209CE78F]:0) [junit4] > at org.apache.solr.client.solrj.impl.CloudSolrServerTest.doTest(CloudSolrServerTest.java:300) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:867) [junit4] > at java.lang. Thread .run( Thread .java:744) [junit4] 2> 26918 T10 oas.SolrTestCaseJ4.deleteCore ###deleteCore
          Hide
          Noble Paul added a comment -

          The CloudSolrServer test was always firing mbeans request to default collection "collection1" so the test was failing. That is fixed now

          Show
          Noble Paul added a comment - The CloudSolrServer test was always firing mbeans request to default collection "collection1" so the test was failing. That is fixed now
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Noble. All tests pass with your patch. I am working on enabling external collection for more cloud tests.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Noble. All tests pass with your patch. I am working on enabling external collection for more cloud tests.
          Hide
          Noble Paul added a comment -

          The ExternalCollectionsTest was missing in the last patch

          Show
          Noble Paul added a comment - The ExternalCollectionsTest was missing in the last patch
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk after SOLR-5859 commit.

          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk after SOLR-5859 commit.
          Hide
          Shalin Shekhar Mangar added a comment -

          I tested quite a bit with external set to true in the AbstractFullDistribZkTestBase. This patch fixes failures encountered in the CLUSTERSTATUS collection API exposed by testing. This patch also sets external=true randomly in the base test class.

          Show
          Shalin Shekhar Mangar added a comment - I tested quite a bit with external set to true in the AbstractFullDistribZkTestBase. This patch fixes failures encountered in the CLUSTERSTATUS collection API exposed by testing. This patch also sets external=true randomly in the base test class.
          Hide
          Mark Miller added a comment -

          Personally, I'm not that big on this 'external' terminology or option. I think if we split up clusterstate.json by collection, that should be the future and the transition should appear less "optional" and "feature choice" like.

          Show
          Mark Miller added a comment - Personally, I'm not that big on this 'external' terminology or option. I think if we split up clusterstate.json by collection, that should be the future and the transition should appear less "optional" and "feature choice" like.
          Hide
          Noble Paul added a comment -

          Yes, I agree with you. But I would to like it to be default mechanism after a couple of releases (Maybe in 5.0) . But we will need to support both in the code because

          • All the user's collections already live in the main clusterstate.json and it should be supported anyway. So the change cannot be immediate .
          • At some point we should help users migrate to the new format using a collection API

          In the interim we should have a way to refer to the new mode. The term should go away soon and it may not be visible to users

          Show
          Noble Paul added a comment - Yes, I agree with you. But I would to like it to be default mechanism after a couple of releases (Maybe in 5.0) . But we will need to support both in the code because All the user's collections already live in the main clusterstate.json and it should be supported anyway. So the change cannot be immediate . At some point we should help users migrate to the new format using a collection API In the interim we should have a way to refer to the new mode. The term should go away soon and it may not be visible to users
          Hide
          Mark Miller added a comment -

          Yeah, I have no problem supporting both for back compat - but I think "the transition should appear less "optional" and "feature choice" like".

          External just seems like a configuration. Perhaps something more along the lines of zk=truth path - legacyClusterState or some such thing that doesn't look like an option to the user, but an old mode going away.

          Show
          Mark Miller added a comment - Yeah, I have no problem supporting both for back compat - but I think "the transition should appear less "optional" and "feature choice" like". External just seems like a configuration. Perhaps something more along the lines of zk=truth path - legacyClusterState or some such thing that doesn't look like an option to the user, but an old mode going away.
          Hide
          Noble Paul added a comment -

          Yeah , got your point,

          Something like a legacyFormat=false

          Show
          Noble Paul added a comment - Yeah , got your point, Something like a legacyFormat=false
          Hide
          Alan Woodward added a comment -

          How about an explicit version numbering system? Having a binary setting (legacyClusterState=true|false) locks us in a bit if we want to make further changes in the future.

          • clusterStateVersion=1 == current monolithic clusterstate.json
          • clusterStateVersion=2 == each collection has their own clusterstate.json
          • clusterStateVersion=3 == crazy futuristic configuration stored on stone tablets, or whatever.

          If the collection doesn't have a clusterStateVersion property, assume it's 1.

          Show
          Alan Woodward added a comment - How about an explicit version numbering system? Having a binary setting (legacyClusterState=true|false) locks us in a bit if we want to make further changes in the future. clusterStateVersion=1 == current monolithic clusterstate.json clusterStateVersion=2 == each collection has their own clusterstate.json clusterStateVersion=3 == crazy futuristic configuration stored on stone tablets, or whatever. If the collection doesn't have a clusterStateVersion property, assume it's 1.
          Hide
          Noble Paul added a comment -

          Alan Woodward I guess we should go with that. In this case the format is a per collection thing. We can have two different collection versions co-existing in the system. The version is also used for the zk node version .

          So we should qualify it explicitly something like stateFormatVersion=2

          Show
          Noble Paul added a comment - Alan Woodward I guess we should go with that. In this case the format is a per collection thing. We can have two different collection versions co-existing in the system. The version is also used for the zk node version . So we should qualify it explicitly something like stateFormatVersion=2
          Hide
          Mark Miller added a comment -

          Yeah, moving to a version sounds good.

          Show
          Mark Miller added a comment - Yeah, moving to a version sounds good.
          Hide
          Mark Miller added a comment -

          Does an 'external' collection also entail not watching state anymore - eg, is that issue wrapped into this one? I have not had a chance to do a code review yet, but hope to start some review soon. I'm hoping not watching state is not tied into splitting up clusterstate.json though, and comments I've been reading about 'external' collections elsewhere started making me thing the two were entwined.

          Show
          Mark Miller added a comment - Does an 'external' collection also entail not watching state anymore - eg, is that issue wrapped into this one? I have not had a chance to do a code review yet, but hope to start some review soon. I'm hoping not watching state is not tied into splitting up clusterstate.json though, and comments I've been reading about 'external' collections elsewhere started making me thing the two were entwined.
          Hide
          Noble Paul added a comment - - edited

          Actually, the main clusterstate.json is watched always. The other individual collection states are watched only as required . The ticket is SOLR-5474

          Show
          Noble Paul added a comment - - edited Actually, the main clusterstate.json is watched always. The other individual collection states are watched only as required . The ticket is SOLR-5474
          Hide
          Mark Miller added a comment -

          If you have a clusterstate.json per collection, there is no main clusterstate.json?

          The other individual collection states are watched only as required

          When is it required?

          Show
          Mark Miller added a comment - If you have a clusterstate.json per collection, there is no main clusterstate.json? The other individual collection states are watched only as required When is it required?
          Hide
          Noble Paul added a comment -

          bq If you have a clusterstate.json per collection, there is no main clusterstate.json?

          If a collection is created in the old format , it still goes to main clusterstate.json. The main clusterstate.json is used as a notification mechanism for everyone to know about creation/deletion of collections.

          When is it required?

          If a node has cores which are part of a particular collection, that node actively watches those collection states.

          Show
          Noble Paul added a comment - bq If you have a clusterstate.json per collection, there is no main clusterstate.json? If a collection is created in the old format , it still goes to main clusterstate.json. The main clusterstate.json is used as a notification mechanism for everyone to know about creation/deletion of collections. When is it required? If a node has cores which are part of a particular collection, that node actively watches those collection states.
          Hide
          Noble Paul added a comment - - edited

          added tests for the CloudSolrServer/SolrDispatchFilter .
          Fixed a bug with stale watched collections ( SOLR-5952 )

          Show
          Noble Paul added a comment - - edited added tests for the CloudSolrServer/SolrDispatchFilter . Fixed a bug with stale watched collections ( SOLR-5952 )
          Hide
          Noble Paul added a comment -

          Removed the external=true option from CREATECOLLECTION and stateFormat=2 is the new equivalent

          Show
          Noble Paul added a comment - Removed the external=true option from CREATECOLLECTION and stateFormat=2 is the new equivalent
          Hide
          Noble Paul added a comment - - edited

          All tests now randomly use external (even collection1)

          few bug fixes

          When , a single state command does multiple updates to the same collection (as in split ) the first update was overwritten

          When client has a collection state that is newer than server , STALE state was thrown. Server now updates its state

          I have done all the planned testing for this , I plan to commit this to trunk (4x branch later) soon if there are no more concerns

          ZkStateReader.updateClusterState() was not updating external collections

          Show
          Noble Paul added a comment - - edited All tests now randomly use external (even collection1) few bug fixes When , a single state command does multiple updates to the same collection (as in split ) the first update was overwritten When client has a collection state that is newer than server , STALE state was thrown. Server now updates its state I have done all the planned testing for this , I plan to commit this to trunk (4x branch later) soon if there are no more concerns ZkStateReader.updateClusterState() was not updating external collections
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Noble.

          There were a few conflicts in CollectionsAPIDistributedZKTest which I have fixed in this patch. I also introduced a system property "tests.solr.stateFormat" which sets the stateFormat to be used for the default collection. If this property is not set then the state format is chosen randomly.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Noble. There were a few conflicts in CollectionsAPIDistributedZKTest which I have fixed in this patch. I also introduced a system property "tests.solr.stateFormat" which sets the stateFormat to be used for the default collection. If this property is not set then the state format is chosen randomly.
          Hide
          Shalin Shekhar Mangar added a comment -

          This fixes failures in OverseerStatusTest when testing with -Dtests.solr.stateFormat=2. The failure was because the test wrongly assumed that the collection create command it issues is the only collection create command seen by the overseer. The default collection is also created by the overseer when stateFormat=2.

          Show
          Shalin Shekhar Mangar added a comment - This fixes failures in OverseerStatusTest when testing with -Dtests.solr.stateFormat=2. The failure was because the test wrongly assumed that the collection create command it issues is the only collection create command seen by the overseer. The default collection is also created by the overseer when stateFormat=2.
          Hide
          Noble Paul added a comment -

          The patch had introduced some unwanted changes to AbstractFullDistribZkTestBase . I removed code changes to CollectionsAPIDistributedZkTest to make individual methods use stateFormat=2 . Piggyback on a common method which can be controlled using the system property

          Show
          Noble Paul added a comment - The patch had introduced some unwanted changes to AbstractFullDistribZkTestBase . I removed code changes to CollectionsAPIDistributedZkTest to make individual methods use stateFormat=2 . Piggyback on a common method which can be controlled using the system property
          Hide
          ASF subversion and git services added a comment -

          Commit 1587834 from noble@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1587834 ]

          SOLR-5473: Make one state.json per collection , SOLR-5474: Have a new mode for SolrJ to support stateFormat=2

          Show
          ASF subversion and git services added a comment - Commit 1587834 from noble@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1587834 ] SOLR-5473 : Make one state.json per collection , SOLR-5474 : Have a new mode for SolrJ to support stateFormat=2
          Hide
          ASF subversion and git services added a comment -

          Commit 1587876 from noble@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1587876 ]

          SOLR-5473 forbidden API usage

          Show
          ASF subversion and git services added a comment - Commit 1587876 from noble@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1587876 ] SOLR-5473 forbidden API usage
          Hide
          Mark Miller added a comment -

          Wow dude - huge fing patch and change to commit with no warning to anyone you are going to commit.

          Show
          Mark Miller added a comment - Wow dude - huge fing patch and change to commit with no warning to anyone you are going to commit.
          Hide
          Shalin Shekhar Mangar added a comment -
          Show
          Shalin Shekhar Mangar added a comment - Wow dude - huge fing patch and change to commit with no warning to anyone you are going to commit. https://issues.apache.org/jira/browse/SOLR-5473?focusedCommentId=13965373&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13965373
          Hide
          Mark Miller added a comment -

          Thanks, I missed that - bummer. I've got some issues with this.

          Show
          Mark Miller added a comment - Thanks, I missed that - bummer. I've got some issues with this.
          Hide
          Shalin Shekhar Mangar added a comment -

          That's okay. This is only on trunk so we can incorporate bug fixes and any suggestions the community may have before we land it on branch_4x. I'd also like to setup a jenkins build which tests with -Dtests.solr.stateFormat=2 exclusively so that we can find issues faster. Can you help with that?

          Show
          Shalin Shekhar Mangar added a comment - That's okay. This is only on trunk so we can incorporate bug fixes and any suggestions the community may have before we land it on branch_4x. I'd also like to setup a jenkins build which tests with -Dtests.solr.stateFormat=2 exclusively so that we can find issues faster. Can you help with that?
          Hide
          Mark Miller added a comment -

          So I skimmed Nobles comment above about updating the patch and missed the line in there about committing to 5X. I've got a review coming before long.

          There is still internal / external stuff in here that should not be, other little things like : "hopefully pipe is not an allowed char in a collection name", etc.

          Will take me a bit to do a full review.

          Show
          Mark Miller added a comment - So I skimmed Nobles comment above about updating the patch and missed the line in there about committing to 5X. I've got a review coming before long. There is still internal / external stuff in here that should not be, other little things like : "hopefully pipe is not an allowed char in a collection name", etc. Will take me a bit to do a full review.
          Hide
          Noble Paul added a comment -

          Mark Miller We can do a couple of iterations on trunk itself before we move it to a release . Please give your feedback and I'll be glad to incorporate them

          Show
          Noble Paul added a comment - Mark Miller We can do a couple of iterations on trunk itself before we move it to a release . Please give your feedback and I'll be glad to incorporate them
          Hide
          Mark Miller added a comment -

          I'd also like to setup a jenkins build which tests with -Dtests.solr.stateFormat=2 exclusively so that we can find issues faster. Can you help with that?

          For a start, I'll add a job to my fullmetaljenkins.org machine with that.

          Show
          Mark Miller added a comment - I'd also like to setup a jenkins build which tests with -Dtests.solr.stateFormat=2 exclusively so that we can find issues faster. Can you help with that? For a start, I'll add a job to my fullmetaljenkins.org machine with that.
          Hide
          Noble Paul added a comment -

          There is still internal / external stuff in here that should not be,

          The internal/external are not visible anywhere for the user. However , they are there in the API. example: DocCollection.isExternal()

          Show
          Noble Paul added a comment - There is still internal / external stuff in here that should not be, The internal/external are not visible anywhere for the user. However , they are there in the API. example: DocCollection.isExternal()
          Hide
          Mark Miller added a comment -

          Yeah, I know, but it should not be an internal thing either. I've started a list of issues, but I'm pretty busy with some other things, so I probably want have a review finished until the weekend.

          Show
          Mark Miller added a comment - Yeah, I know, but it should not be an internal thing either. I've started a list of issues, but I'm pretty busy with some other things, so I probably want have a review finished until the weekend.
          Hide
          Mark Miller added a comment -

          There are important omissions on credit for this issue by the way.

          * SOLR-5473: Make one state.json per collection (Noble Paul)

          Shalin also posted patches and did work on this issue, and Alan made a large impact with his design suggestion. Anshum also did review and made many suggestions. They all need credit in the CHANGES entry.

          Show
          Mark Miller added a comment - There are important omissions on credit for this issue by the way. * SOLR-5473 : Make one state.json per collection (Noble Paul) Shalin also posted patches and did work on this issue, and Alan made a large impact with his design suggestion. Anshum also did review and made many suggestions. They all need credit in the CHANGES entry.
          Hide
          Mark Miller added a comment -

          The internal/external are not visible anywhere for the user.

          It's visible right here in some logging: Registering watch for external collection {}

          In any case, it all needs to be removed. There is no such thing as an external collection in Solr and having it in the code is confusing and sloppy. We have a format version for the clusterstate.json and the external and internal code should reflect that.

          Show
          Mark Miller added a comment - The internal/external are not visible anywhere for the user. It's visible right here in some logging: Registering watch for external collection {} In any case, it all needs to be removed. There is no such thing as an external collection in Solr and having it in the code is confusing and sloppy. We have a format version for the clusterstate.json and the external and internal code should reflect that.
          Hide
          Mark Miller added a comment -

          The following would be good to clean up in particular and really is also exposed to the userI. This doesn't really jive with our new format strategy and the methods have no javadoc - also, what is a 'common' collection now? At a minimum, these methods should be well doc'd or marked internal/expert, but it would be better to think about how we should handle this in light of our format version'ing strategy rather than the old internal/external one off strategy. If we can't do that nicely, we should make comments noting the failure and warning the user.

            public boolean hasExternalCollection(String coll) {
              return stateReader.getAllCollections().contains(coll) && !collectionStates.containsKey(coll);
          
            }
            public Set<String> getAllInternalCollections(){
              return Collections.unmodifiableSet(collectionStates.keySet());
            }
          
            ...
          
            public DocCollection getCommonCollection(String name){
              return collectionStates.get(name);
          
            }
          
          Show
          Mark Miller added a comment - The following would be good to clean up in particular and really is also exposed to the userI. This doesn't really jive with our new format strategy and the methods have no javadoc - also, what is a 'common' collection now? At a minimum, these methods should be well doc'd or marked internal/expert, but it would be better to think about how we should handle this in light of our format version'ing strategy rather than the old internal/external one off strategy. If we can't do that nicely, we should make comments noting the failure and warning the user. public boolean hasExternalCollection(String coll) { return stateReader.getAllCollections().contains(coll) && !collectionStates.containsKey(coll); } public Set<String> getAllInternalCollections(){ return Collections.unmodifiableSet(collectionStates.keySet()); } ... public DocCollection getCommonCollection(String name){ return collectionStates.get(name); }
          Hide
          Mark Miller added a comment -

          ClusterStater#getCachedCollection and ClusterStater#getCachedReplica could use better names. ClusterState itself is a cached object and getCollection is already giving you cached state.

          Show
          Mark Miller added a comment - ClusterStater#getCachedCollection and ClusterStater#getCachedReplica could use better names. ClusterState itself is a cached object and getCollection is already giving you cached state.
          Hide
          Noble Paul added a comment - - edited

          At a minimum, these methods should be well doc'd or marked internal/expert

          Actually we need to mark those methods as internal/expert. I don't see a more appropriate method name for those

          ClusterStater#getCachedCollection and ClusterStater#getCachedReplica

          ClusterStater#getCachedCollection can be made private. ClusterStater#getCachedReplica can be renamed to ClusterStater#getReplica(String collectionName, String coreNodeName, boolean cachedOnly)

          I can't think of a better name for getCommonCollection() , we can mark it as 'internal' only

          It's visible right here in some logging: Registering watch for external collection {}

          I missed the logging part, That'll b fixed

          And credits, of course

          Show
          Noble Paul added a comment - - edited At a minimum, these methods should be well doc'd or marked internal/expert Actually we need to mark those methods as internal/expert. I don't see a more appropriate method name for those ClusterStater#getCachedCollection and ClusterStater#getCachedReplica ClusterStater#getCachedCollection can be made private. ClusterStater#getCachedReplica can be renamed to ClusterStater#getReplica(String collectionName, String coreNodeName, boolean cachedOnly) I can't think of a better name for getCommonCollection() , we can mark it as 'internal' only It's visible right here in some logging: Registering watch for external collection {} I missed the logging part, That'll b fixed And credits, of course
          Hide
          Mark Miller added a comment -

          I don't see a more appropriate method name for those

          I think more appropriate names are something along the lines of Format1 and Format2. There is really no reason to have a single internal/external in the patch as it just adds confusion. And we should also have good javadoc around the format feature and what the differences are between the formats. It should be simple for a new developer to come on and understand what is going on around the format version.

          Show
          Mark Miller added a comment - I don't see a more appropriate method name for those I think more appropriate names are something along the lines of Format1 and Format2. There is really no reason to have a single internal/external in the patch as it just adds confusion. And we should also have good javadoc around the format feature and what the differences are between the formats. It should be simple for a new developer to come on and understand what is going on around the format version.
          Hide
          Noble Paul added a comment -

          I think more appropriate names are something along the lines of Format1 and Format2

          I think the name is useful. format2 is not self explanatory, but external/internal is. Down the line when we have moved past 2 and will anyone know what was the meaning of format 2? but in the API it helps to have the internal/external IMO

          Show
          Noble Paul added a comment - I think more appropriate names are something along the lines of Format1 and Format2 I think the name is useful. format2 is not self explanatory, but external/internal is. Down the line when we have moved past 2 and will anyone know what was the meaning of format 2? but in the API it helps to have the internal/external IMO
          Hide
          Noble Paul added a comment -

          removed references of external collections

          Show
          Noble Paul added a comment - removed references of external collections
          Hide
          Ryan Ernst added a comment -

          format2 is not self explanatory, but external/internal is. Down the line when we have moved past 2 and will anyone know what was the meaning of format 2? but in the API it helps to have the internal/external IMO

          external/internal is just as cryptic. In either case (formatN or some short one word description), someone working on building a new format, or upgrading, would need more detail (docs or code comment). So I think formatN is better because it simplifies what the next one will be, vs having to come up with a debatable one word description of the format (and what if there are multiple things changing in a new format?).

          Show
          Ryan Ernst added a comment - format2 is not self explanatory, but external/internal is. Down the line when we have moved past 2 and will anyone know what was the meaning of format 2? but in the API it helps to have the internal/external IMO external/internal is just as cryptic. In either case (formatN or some short one word description), someone working on building a new format, or upgrading, would need more detail (docs or code comment). So I think formatN is better because it simplifies what the next one will be, vs having to come up with a debatable one word description of the format (and what if there are multiple things changing in a new format?).
          Hide
          Mark Miller added a comment -

          external/internal is just as cryptic.

          Not only is it cryptic, but it's just poor planning. We moved to FormatN for a reason. We can't just keep adding new random naming conventions every time we change the format. Anything related to different zk formats now should lead back to FormatN. FormatN won the discussion and internal/external remnants have a -1 from me.

          Like I mentioned above, FormatN needs to be documented, the changes between formats need to be documented, and we need to refer to those formats and not make up unnecessary and undefined random names. We can't be lazy about large core changes like this.

          Show
          Mark Miller added a comment - external/internal is just as cryptic. Not only is it cryptic, but it's just poor planning. We moved to FormatN for a reason. We can't just keep adding new random naming conventions every time we change the format. Anything related to different zk formats now should lead back to FormatN. FormatN won the discussion and internal/external remnants have a -1 from me. Like I mentioned above, FormatN needs to be documented, the changes between formats need to be documented, and we need to refer to those formats and not make up unnecessary and undefined random names. We can't be lazy about large core changes like this.
          Hide
          Mark Miller added a comment -

          I've reopened this for a bit - just until I can finish a code review.

          Show
          Mark Miller added a comment - I've reopened this for a bit - just until I can finish a code review.
          Hide
          Noble Paul added a comment -

          Mark Miller plz apply the latest patch for review

          Show
          Noble Paul added a comment - Mark Miller plz apply the latest patch for review
          Hide
          Mark Miller added a comment -

          I'll take look asap.

          I was just looking at the zk dump for a stateFormat=2 setup and had a couple comments:

              /solr/collections/collection1/state (0)
              DATA:
                  {"collection1":{
                      "stateFormat":2,
          
          • I think it should be state.json just like clusterstate.json.
          • Why is stateFormat in the collection? It seems we would already have to know its stateFormat=2 to even been reading the info, so does this really provide for anything?
          Show
          Mark Miller added a comment - I'll take look asap. I was just looking at the zk dump for a stateFormat=2 setup and had a couple comments: /solr/collections/collection1/state (0) DATA: { "collection1" :{ "stateFormat" :2, I think it should be state.json just like clusterstate.json. Why is stateFormat in the collection? It seems we would already have to know its stateFormat=2 to even been reading the info, so does this really provide for anything?
          Hide
          Noble Paul added a comment -

          Why is stateFormat in the collection

          We are using stateFormat for differentiating two different versions . If we have a version 3 later how do we differentiate between versions ? It is just a good practice to explicitly state the versions

          I think it should be state.json just like clusterstate.json.

          We could do that . was thinking that we are planning to keep almost everything in json format , so is the suffix redundant?

          Show
          Noble Paul added a comment - Why is stateFormat in the collection We are using stateFormat for differentiating two different versions . If we have a version 3 later how do we differentiate between versions ? It is just a good practice to explicitly state the versions I think it should be state.json just like clusterstate.json. We could do that . was thinking that we are planning to keep almost everything in json format , so is the suffix redundant?
          Hide
          Mark Miller added a comment -

          It is just a good practice to explicitly state the versions

          Can you give a concrete example of how having that info there will be useful? It still makes no sense to me.

          keep almost everything in json format , so is the suffix redundant?

          The 'almost' makes it not redundant. So far, we have taken the convention of naming it clusterstate.json - it used to be clusterstate.xml. Just like when reading a file from a filesystem, an extension is useful when reading a file from ZooKeeper (you might think of it like a distributed file system).

          In any case, it's how we currently do things and it should take an argument to change rather than to keep the same I think.

          Show
          Mark Miller added a comment - It is just a good practice to explicitly state the versions Can you give a concrete example of how having that info there will be useful? It still makes no sense to me. keep almost everything in json format , so is the suffix redundant? The 'almost' makes it not redundant. So far, we have taken the convention of naming it clusterstate.json - it used to be clusterstate.xml. Just like when reading a file from a filesystem, an extension is useful when reading a file from ZooKeeper (you might think of it like a distributed file system). In any case, it's how we currently do things and it should take an argument to change rather than to keep the same I think.
          Hide
          Mark Miller added a comment -

          I've started looking at this patch. It looks like this is not just make one state.json per collection?

          When you make one state.sjon per collection, does this patch also stop using watchers on those collections? That is, if I'm using stateFormat=2, cluster state will no longer be updated by watchers?

          Show
          Mark Miller added a comment - I've started looking at this patch. It looks like this is not just make one state.json per collection? When you make one state.sjon per collection, does this patch also stop using watchers on those collections? That is, if I'm using stateFormat=2, cluster state will no longer be updated by watchers?
          Hide
          Mark Miller added a comment -
                if(collection !=null && collection.isExternal()  ){
                  log.info("Registering watch for external collection {}",cd.getCloudDescriptor().getCollectionName());
                  zkStateReader.addCollectionWatch(cd.getCloudDescriptor().getCollectionName());
                }
          

          That is the part I'm curious about - I don't know what internal, external means.

          Here is what what makes sense to me for a JIRA issue named "Make one state.json per collection" and the previous discussions we have had.

          Storing all collections in cluster.json will be considered clusterstate stateFormat=1.
          Storing each collection in it's own cluster.json will be considered clusterstate stateFormat=2.

          I looked at your patch and it still leaves tons of internal, external stuff. It's all very, very confusing. If I had reviewed this before it was committed, I would say it's not ready.

          In which cases do we makes watches? In which cases don't we? I would oppose changing the default behavior to stop using watches, but without spending a lot of time with the code, it's not very clear how the new stuff works vs the old.

          Some notes on your new patch:

          In your patch you have made the change:

          -        if (coll.isExternal()) {
          +        if (coll.getStateVersion()>1) {
          

          Why do we have getStateVersion and stateFormat? Why > 1 and not =2?

          log.info("Creating collection with stateFormat=3: " + collectionName);

          stateFormat=3?

          Back to this issue overall:

          This adds a lot of confusing methods and uses a lot of confusing terminology, non of which is defined. Most of the new methods are also not documented and there are few comments to help explain what is going on in the new stuff at a high level. I think this really muddles a bunch of user API's.

          I'd like to understand more about the watcher situation, but I think there is still a lot of clean up to do here and I'll look at making a patch next week.

          Show
          Mark Miller added a comment - if (collection != null && collection.isExternal() ){ log.info( "Registering watch for external collection {}" ,cd.getCloudDescriptor().getCollectionName()); zkStateReader.addCollectionWatch(cd.getCloudDescriptor().getCollectionName()); } That is the part I'm curious about - I don't know what internal, external means. Here is what what makes sense to me for a JIRA issue named "Make one state.json per collection" and the previous discussions we have had. Storing all collections in cluster.json will be considered clusterstate stateFormat=1. Storing each collection in it's own cluster.json will be considered clusterstate stateFormat=2. I looked at your patch and it still leaves tons of internal, external stuff. It's all very, very confusing. If I had reviewed this before it was committed, I would say it's not ready. In which cases do we makes watches? In which cases don't we? I would oppose changing the default behavior to stop using watches, but without spending a lot of time with the code, it's not very clear how the new stuff works vs the old. Some notes on your new patch: In your patch you have made the change: - if (coll.isExternal()) { + if (coll.getStateVersion()>1) { Why do we have getStateVersion and stateFormat? Why > 1 and not =2? log.info("Creating collection with stateFormat=3: " + collectionName); stateFormat=3? Back to this issue overall: This adds a lot of confusing methods and uses a lot of confusing terminology, non of which is defined. Most of the new methods are also not documented and there are few comments to help explain what is going on in the new stuff at a high level. I think this really muddles a bunch of user API's. I'd like to understand more about the watcher situation, but I think there is still a lot of clean up to do here and I'll look at making a patch next week.
          Hide
          Mark Miller added a comment - - edited

          externalWatchedCollections

          What are these? Could we get a comment to describe?

          getExternCollectionFresh

          What is this? Again, I think we need something better than Extern(sp?) and Fresh, but the method itself also has no documentation.

          * <b>Advance usage</b>

          The project tends to use Expert or Expert method, and a lot of these new methods should probably have been marked lucene.experimental. Also, appears to be a typo.

          private Map<String , DocCollection>

          if(zkStateReader.ephemeralCollectionData !=null ){

          return cs.getCommonCollection(coll);

          A lot of your code that gets committed has odd formatting - would be great to use one of the eclipse or intellij code profiles we have for formatting.

             * This method can be used to fetch a collection object and control whether it hits
             * the cache only or if information can be looked up from ZooKeeper.
          

          I brought this up before and it didn't seem to be address in your patch? I don't know how a user can understand this - you could always get the latest collection info by doing zkStateReader#updateState and then getCollecttion. What does this method offer that is different that is worth the API ugliness?

              } catch (InterruptedException e) {
                throw new SolrException(ErrorCode.BAD_REQUEST, "Could not load collection from ZK:" + coll, e);
              }
          

          When you catch an InterruptedException you should do Thread.currenThread.interrupt() to reset the flag.

            /**This is not a public API. Only used by ZkController */
            public void removeZKWatch(final String coll){
          

          Should be marked internal then. Not a great design that has public internal methods on public objects though.

          Show
          Mark Miller added a comment - - edited externalWatchedCollections What are these? Could we get a comment to describe? getExternCollectionFresh What is this? Again, I think we need something better than Extern(sp?) and Fresh, but the method itself also has no documentation. * <b>Advance usage</b> The project tends to use Expert or Expert method, and a lot of these new methods should probably have been marked lucene.experimental. Also, appears to be a typo. private Map<String , DocCollection> if(zkStateReader.ephemeralCollectionData !=null ){ return cs.getCommonCollection(coll); A lot of your code that gets committed has odd formatting - would be great to use one of the eclipse or intellij code profiles we have for formatting. * This method can be used to fetch a collection object and control whether it hits * the cache only or if information can be looked up from ZooKeeper. I brought this up before and it didn't seem to be address in your patch? I don't know how a user can understand this - you could always get the latest collection info by doing zkStateReader#updateState and then getCollecttion. What does this method offer that is different that is worth the API ugliness? } catch (InterruptedException e) { throw new SolrException(ErrorCode.BAD_REQUEST, "Could not load collection from ZK:" + coll, e); } When you catch an InterruptedException you should do Thread.currenThread.interrupt() to reset the flag. /**This is not a public API. Only used by ZkController */ public void removeZKWatch( final String coll){ Should be marked internal then. Not a great design that has public internal methods on public objects though.
          Hide
          Mark Miller added a comment - - edited

          public Map ephemeralCollectionData;

          This is not thread safe and ZkStateReader needs to be.

          Also, -1 on this public variable on ZkStateReader. It's a bad design smell for good reason.

          Show
          Mark Miller added a comment - - edited public Map ephemeralCollectionData; This is not thread safe and ZkStateReader needs to be. Also, -1 on this public variable on ZkStateReader. It's a bad design smell for good reason.
          Hide
          Mark Miller added a comment -

          I also think this expanded usage of zkStateReader in ClusterState is really the wrong thing to do. If anything, we should pull the reader that was there already out.

          ClusterState is an immutable object that is simply meant to hold the clusterstate - this zkStatereader calls in and methods like getCached* are a complete violation of the classes spirit and intent.

          There is a lot to do on this patch IMO. I would prefer we take it out and get it in shape on a branch before it's too hard to take out. As it is, it's really not fit to go in, and the longer it hangs around in 5, the harder it will be to pull it out if it's not improved.

          Show
          Mark Miller added a comment - I also think this expanded usage of zkStateReader in ClusterState is really the wrong thing to do. If anything, we should pull the reader that was there already out. ClusterState is an immutable object that is simply meant to hold the clusterstate - this zkStatereader calls in and methods like getCached* are a complete violation of the classes spirit and intent. There is a lot to do on this patch IMO. I would prefer we take it out and get it in shape on a branch before it's too hard to take out. As it is, it's really not fit to go in, and the longer it hangs around in 5, the harder it will be to pull it out if it's not improved.
          Hide
          Mark Miller added a comment -

          I have to make an official -1 veto vote on this issue.

          The issue is to break clusterstate.json into one per collection instead of a global one. This format change in ZooKeeper should not ripple in such an ugly manner through all of our cloud API's. We should also be dropping the global clusterstate.json in 5, so there is no way any of these crazy API changes should exist. If anything, the new API's should be clean and we should have uglier back compat stuff.

          As it is, it seems like things were hacked in the shortest route possible - and while that's great for a prototype or straw man impl, it's a horrendous direction for all these API's.

          There is a lot of general clean up to do, but more importantly, there are problems we need to solve without muddling up all the cloud API's.

          The way things are done now, there is not even a reason for stateFormat=1 or 2 - that was clearly just jammed on top of what was going on anyway, with little thought or integration.

          I have to veto this, it's way to crazy still. The work needs to be done to keep this from bubbling through all these API's so terribly and trunk is not the place to do it. It's too far from ready and will cause too much pain with 4x backports.

          Show
          Mark Miller added a comment - I have to make an official -1 veto vote on this issue. The issue is to break clusterstate.json into one per collection instead of a global one. This format change in ZooKeeper should not ripple in such an ugly manner through all of our cloud API's. We should also be dropping the global clusterstate.json in 5, so there is no way any of these crazy API changes should exist. If anything, the new API's should be clean and we should have uglier back compat stuff. As it is, it seems like things were hacked in the shortest route possible - and while that's great for a prototype or straw man impl, it's a horrendous direction for all these API's. There is a lot of general clean up to do, but more importantly, there are problems we need to solve without muddling up all the cloud API's. The way things are done now, there is not even a reason for stateFormat=1 or 2 - that was clearly just jammed on top of what was going on anyway, with little thought or integration. I have to veto this, it's way to crazy still. The work needs to be done to keep this from bubbling through all these API's so terribly and trunk is not the place to do it. It's too far from ready and will cause too much pain with 4x backports.
          Hide
          Noble Paul added a comment - - edited

          Before I comment further, I would like to lay out the objective and design of this ticket.

          Splitting out the main clutserstate.json was not done just to make each state smaller, but also to make fewer nodes update the state changes. If you have a large no:of collections , most of the nodes will never need to know about most of the other collections. So , it is not a good idea to have an explosion on the no:of watchers because we split the monolithic clusterstate . I would say it actually would be bad for scalability.

          So the approach is , whenever a node has a core which is a member of a collection , register a watch for the state of that collection and when the last core that is a member of that collection is unloaded, remove the watch. For those who don't watch other collections they can still fetch the state directly on demand.

          The fact that the main clusterstate.json may not have all the collections means that it cannot hold all the states within it. Though the actual data inside the object does not change, the returned values can be different based on different situations. The system does not rely on the immutability of clusterstate anywhere, It just expects the method calls to provide the correct data all the time. In fact it is even plain wrong to hold on to the clusterstate object for later use because it could change the data at anytime. The only place where we reuse the object is inside Overseer where we know that the state is only modified by itself. ZkStateReader is the class that is responsible for keeping the states upto date and the best way to make the clusterstate APIs behave consistently is to have a reference of ZkStateReader

          The latest patch is not a final one . The objective of that patch was to get the naming right .

          Can you give a concrete example of how having that info(stateFormat) there will be useful? It still makes no sense to me.

          Though it is possible to deduce the format from where the data is fetched, It will have to be onus of the code that serialize/deserialize the object back and forth from json to have format correctly. If it is stored within the object that will be be automatically be taken care of. If we introduce a new stateFormat=3 , the system will have already collections made with stateFormat=2 but that does not explicitly say so. So we will have to put some logic into the serialization/deserialization logic to differentiate 2 from 3 . So it is just safer and consistent to encode the version of the format with the payload itself. That is the convention all the serialization/deserialization libraries follow

          getExternCollectionFresh, getCommonCollection etc are gone in the latest patch. The objective was to get rid of external/internal from the public APIs .The internal variables contain the external moniker for lack of better names . I hope it is not a problem. We can always add javadocs to make them clearer.

          Actually I use the intellij project formatting after doing an 'ant idea' . I will have to investigate if/why it is different.

          Why do we have getStateVersion and stateFormat? Why > 1 and not =2?

          If I introduce a new format 3, will it not include 2 also? That is just future proofing .

          This is not thread safe and ZkStateReader needs to be.

          Also, -1 on this public variable on ZkStateReader. It's a bad design smell for good reason.

          This has to be threadsafe. Will fix

          I'll then have to make it a public setter method. Other than that there is no saner way for Overseer to expose the temporary data . I spent some time on figuring out a cleaner way , I will be glad to implement a better suggestion

          When you catch an InterruptedException you should do Thread.currenThread.interrupt() to reset the flag.

          will do

          removeZkWatch() Should be marked internal then. Not a great design that has public internal methods on public objects though.

          I would have preferred them to be package local, If possible. Unfortunately java does not allow it. What other choice do we have ?

          Show
          Noble Paul added a comment - - edited Before I comment further, I would like to lay out the objective and design of this ticket. Splitting out the main clutserstate.json was not done just to make each state smaller, but also to make fewer nodes update the state changes. If you have a large no:of collections , most of the nodes will never need to know about most of the other collections. So , it is not a good idea to have an explosion on the no:of watchers because we split the monolithic clusterstate . I would say it actually would be bad for scalability. So the approach is , whenever a node has a core which is a member of a collection , register a watch for the state of that collection and when the last core that is a member of that collection is unloaded, remove the watch. For those who don't watch other collections they can still fetch the state directly on demand. The fact that the main clusterstate.json may not have all the collections means that it cannot hold all the states within it. Though the actual data inside the object does not change, the returned values can be different based on different situations. The system does not rely on the immutability of clusterstate anywhere, It just expects the method calls to provide the correct data all the time. In fact it is even plain wrong to hold on to the clusterstate object for later use because it could change the data at anytime. The only place where we reuse the object is inside Overseer where we know that the state is only modified by itself. ZkStateReader is the class that is responsible for keeping the states upto date and the best way to make the clusterstate APIs behave consistently is to have a reference of ZkStateReader The latest patch is not a final one . The objective of that patch was to get the naming right . Can you give a concrete example of how having that info(stateFormat) there will be useful? It still makes no sense to me. Though it is possible to deduce the format from where the data is fetched, It will have to be onus of the code that serialize/deserialize the object back and forth from json to have format correctly. If it is stored within the object that will be be automatically be taken care of. If we introduce a new stateFormat=3 , the system will have already collections made with stateFormat=2 but that does not explicitly say so. So we will have to put some logic into the serialization/deserialization logic to differentiate 2 from 3 . So it is just safer and consistent to encode the version of the format with the payload itself. That is the convention all the serialization/deserialization libraries follow getExternCollectionFresh, getCommonCollection etc are gone in the latest patch. The objective was to get rid of external/internal from the public APIs .The internal variables contain the external moniker for lack of better names . I hope it is not a problem. We can always add javadocs to make them clearer. Actually I use the intellij project formatting after doing an 'ant idea' . I will have to investigate if/why it is different. Why do we have getStateVersion and stateFormat? Why > 1 and not =2? If I introduce a new format 3, will it not include 2 also? That is just future proofing . This is not thread safe and ZkStateReader needs to be. Also, -1 on this public variable on ZkStateReader. It's a bad design smell for good reason. This has to be threadsafe. Will fix I'll then have to make it a public setter method. Other than that there is no saner way for Overseer to expose the temporary data . I spent some time on figuring out a cleaner way , I will be glad to implement a better suggestion When you catch an InterruptedException you should do Thread.currenThread.interrupt() to reset the flag. will do removeZkWatch() Should be marked internal then. Not a great design that has public internal methods on public objects though. I would have preferred them to be package local, If possible. Unfortunately java does not allow it. What other choice do we have ?
          Hide
          Mark Miller added a comment -

          The latest patch is not a final one .

          I have to stick with my veto - the API changes are way to crazy. There is no quick fix here. I'm going to insist on my code veto and that this gets on track in a branch.

          The objective of that patch was to get the naming right .

          But it was barely even a start. However, the names are not even the problem, which is why these needs way more work. Even if we rename the methods, it's still all super crazy compared to what we have and straddling two worlds in a way that both are ugly and the combination is just whacked. I realize this was done because doing it where we keep sensible API's is harder given what you would like to do, but that's not a good enough reason.

          I'm also not sold on the new watch approach. I am sold on splitting up clusterstate.json, but you have tied a lot of other stuff into this commit that is much more controversial.

          I'm sticking to my code veto and this should be reverted and moved to a branch.

          Show
          Mark Miller added a comment - The latest patch is not a final one . I have to stick with my veto - the API changes are way to crazy. There is no quick fix here. I'm going to insist on my code veto and that this gets on track in a branch. The objective of that patch was to get the naming right . But it was barely even a start. However, the names are not even the problem, which is why these needs way more work. Even if we rename the methods, it's still all super crazy compared to what we have and straddling two worlds in a way that both are ugly and the combination is just whacked. I realize this was done because doing it where we keep sensible API's is harder given what you would like to do, but that's not a good enough reason. I'm also not sold on the new watch approach. I am sold on splitting up clusterstate.json, but you have tied a lot of other stuff into this commit that is much more controversial. I'm sticking to my code veto and this should be reverted and moved to a branch.
          Hide
          Noble Paul added a comment - - edited

          hi Mark Miller I'm working on a reverse patch for this, and we can move the further development to a dedicated branch

          Meanwhile I would like to know what your concerns are on the following

          • The idea of splitting the clusterstate.json itself
          • The external interface . The public API, the changes we make to the zk nodes (I mean all the public things that impact the user)
          • The idea of selectively watching states . And other implementation details, Or any other particular solutions which you think are better
          • The API's which are 'undesirable'

          I would like to work towards a consensus and resolve this

          Show
          Noble Paul added a comment - - edited hi Mark Miller I'm working on a reverse patch for this, and we can move the further development to a dedicated branch Meanwhile I would like to know what your concerns are on the following The idea of splitting the clusterstate.json itself The external interface . The public API, the changes we make to the zk nodes (I mean all the public things that impact the user) The idea of selectively watching states . And other implementation details, Or any other particular solutions which you think are better The API's which are 'undesirable' I would like to work towards a consensus and resolve this
          Hide
          Mark Miller added a comment -

          Thank you.

          The idea of splitting the clusterstate.json itself

          I have no problem with this.

          The external interface

          I've already gone into a lot of the points. We can dig more into what is not clear from the above.

          The idea of selectively watching states

          The is also probably fine, though I'm not sure it's right to change it by default, and I'm not sure it should be so tied into "The idea of splitting the clusterstate.json itself". Breaking things into parts makes it easier to digest and build and document properly.

          The API's which are 'undesirable'

          I go into that above - again, I can answer specific questions. Look at all the get collection methods - look at all the crazy different behaviors depending on what you call - look at the lack of documentation. Future developers will be hopelessly lost. Anyway, I've brought up enough issues above to get started on understanding what some of the current problems are. If you look at the API's now, you can see it's just a mess. It all seems to work okay, and that is good, but it needs to be done thoughtfully as well, and I don't think anyone can easily deal with API's as they are.

          I would like to work towards a consensus and resolve this

          I'm sure that can be done - I do think there is a lot to do and it's too core to rush it in.

          I think a good approach would be too break it up and do things in discrete parts - eg splitting up clusterstate.json seems independent of a lot of these other changes. That part is not the most important part though - mostly, we have to get to some well documented API's that make sense - especially on 5x where we don't even necessarily have back compat concerns.

          Show
          Mark Miller added a comment - Thank you. The idea of splitting the clusterstate.json itself I have no problem with this. The external interface I've already gone into a lot of the points. We can dig more into what is not clear from the above. The idea of selectively watching states The is also probably fine, though I'm not sure it's right to change it by default, and I'm not sure it should be so tied into "The idea of splitting the clusterstate.json itself". Breaking things into parts makes it easier to digest and build and document properly. The API's which are 'undesirable' I go into that above - again, I can answer specific questions. Look at all the get collection methods - look at all the crazy different behaviors depending on what you call - look at the lack of documentation. Future developers will be hopelessly lost. Anyway, I've brought up enough issues above to get started on understanding what some of the current problems are. If you look at the API's now, you can see it's just a mess. It all seems to work okay, and that is good, but it needs to be done thoughtfully as well, and I don't think anyone can easily deal with API's as they are. I would like to work towards a consensus and resolve this I'm sure that can be done - I do think there is a lot to do and it's too core to rush it in. I think a good approach would be too break it up and do things in discrete parts - eg splitting up clusterstate.json seems independent of a lot of these other changes. That part is not the most important part though - mostly, we have to get to some well documented API's that make sense - especially on 5x where we don't even necessarily have back compat concerns.
          Hide
          Noble Paul added a comment -

          The external interface

          I've already gone into a lot of the points. We can dig more into what is not clear from the above.

          stateFormat attribute in collection, state node have the .json suffix. Please add if missed anything

          The idea of selectively watching states

          The is also probably fine, though I'm not sure it's right to change it by default, and I'm not sure it should be so tied into "The idea of splitting the clusterstate.json itself". Breaking things into parts makes it easier to digest and build and document properly.

          I'm not sure if it is possible to split them completely . The moment I split the states my choices are

          1. all nodes watch all the collections
          2. nodes selectively watch collections
          3. nodes watch no collections and read them all real time

          One or more of the three solutions needs to be built new into the system and I have only added the 2nd because I thought only that would be useful. Do you think the other solutions are worthwhile to build or can you think of a better solution I probably would have missed?

          The API's which are 'undesirable'

          I would take a relook at these . Meanwhile , if you can visualize what the API's should look like please post them here

          Show
          Noble Paul added a comment - The external interface I've already gone into a lot of the points. We can dig more into what is not clear from the above. stateFormat attribute in collection, state node have the .json suffix. Please add if missed anything The idea of selectively watching states The is also probably fine, though I'm not sure it's right to change it by default, and I'm not sure it should be so tied into "The idea of splitting the clusterstate.json itself". Breaking things into parts makes it easier to digest and build and document properly. I'm not sure if it is possible to split them completely . The moment I split the states my choices are all nodes watch all the collections nodes selectively watch collections nodes watch no collections and read them all real time One or more of the three solutions needs to be built new into the system and I have only added the 2nd because I thought only that would be useful. Do you think the other solutions are worthwhile to build or can you think of a better solution I probably would have missed? The API's which are 'undesirable' I would take a relook at these . Meanwhile , if you can visualize what the API's should look like please post them here
          Hide
          Timothy Potter added a comment -

          Thought I'd add my 2 cents on this one as I've worked on some of this code and want to get a better sense of how to move forward. Reverting and moving out to a branch sounds like a good idea.

          In general, I think it would be good to split the discussion about this topic into 3 sections: 1) overall design / architecture, 2) implementation and impact on public API, 3) testing. Moving forward we should start with identifying where we have common ground in these areas and which aspects are more controversial and need more hashing out between us.

          Here's what I think I know but please correct where I'm off-base:

          1) Overall Design / Architecture

          It sounds like we're all on-board with splitting cluster state into a per-collection state znode. Do we intend to support both formats or do we intend to just migrate to the split approach? I think the answer is the latter, that going forward, SolrCloud will keep state in a separate znode per collection.

          Noble's idea is that once the state is split, then cores only need to watch the znode for the collection/shard it's linked to. In other words, each SolrCore watches a specific state znode and thus does not receive any state change updates for other collections.

          In terms of what's watched and what is not watched, this patch includes code from 5474 (as they were too intimately tied together to keep separated) which doesn't watch collection state changes on the client side. Instead the client relies on a stateVer check during request processing and receives an error from the server if the client state is stale. I too think this is a little controversial / confusing and maybe we don't have to keep that as part of this solution. It was our mistake to merge those two into a single patch. We originally were thinking 5474 was needed to keep the number of watchers on a znode to a minimum in the event of many clients using many collections. However, I do think this feature can be split out and dealt with in a better way, if at all. In other words, split state znodes are watched from server and client side.

          Are there any other things design / architecture wise that are controversial?

          2) Implementation (and API impact)

          This seems like the biggest area of contention right now. The main issue is that the API changes still give the impression of two state tracking formats, whereas we really only want one format.

          The common ground here is that there should be no mention of "external" in any public method or state format for that matter, right?

          Noble: Assuming we're moving forward with stateFormat == 2 and the unified /clusterstate.json is going away, is it possible to not change any of the existing public methods? In other words, we're changing the internals of where state is kept, so why does that have to impact the public API? If not, let's come up with a plan for each change and how we can minimize impact of this. It seems to me that we need to be more diligent about API impacts of this change and focus on not breaking the public view of cluster state as much as possible. It would be helpful to have a bullet list of API impacts that are needed for this so we don't have to scour the patch looking for all possible changes.

          3) Testing

          I just wanted to mention that we've been doing a fair amount of integration testing with 100's of "external" collections per cluster. So I realize this is a big change but we have been testing this extensively in our QA labs. I only mention this so that others know that have been concentrating on hardening this feature over the past couple of months. Once we sort out the API problems, I'm confident that this approach will be solid.

          To recap, I see a lot of common ground here and to move forward, we need to move this out to a branch and off trunk where we'll focus on cleaning up the API impacts of this work, support only the split format going forward (with a migration plan for existing installations). We also want to revisit the thinking behind not watching state changes on the client as that wasn't clear in the patch to this point.

          Show
          Timothy Potter added a comment - Thought I'd add my 2 cents on this one as I've worked on some of this code and want to get a better sense of how to move forward. Reverting and moving out to a branch sounds like a good idea. In general, I think it would be good to split the discussion about this topic into 3 sections: 1) overall design / architecture, 2) implementation and impact on public API, 3) testing. Moving forward we should start with identifying where we have common ground in these areas and which aspects are more controversial and need more hashing out between us. Here's what I think I know but please correct where I'm off-base: 1) Overall Design / Architecture It sounds like we're all on-board with splitting cluster state into a per-collection state znode. Do we intend to support both formats or do we intend to just migrate to the split approach? I think the answer is the latter, that going forward, SolrCloud will keep state in a separate znode per collection. Noble's idea is that once the state is split, then cores only need to watch the znode for the collection/shard it's linked to. In other words, each SolrCore watches a specific state znode and thus does not receive any state change updates for other collections. In terms of what's watched and what is not watched, this patch includes code from 5474 (as they were too intimately tied together to keep separated) which doesn't watch collection state changes on the client side. Instead the client relies on a stateVer check during request processing and receives an error from the server if the client state is stale. I too think this is a little controversial / confusing and maybe we don't have to keep that as part of this solution. It was our mistake to merge those two into a single patch. We originally were thinking 5474 was needed to keep the number of watchers on a znode to a minimum in the event of many clients using many collections. However, I do think this feature can be split out and dealt with in a better way, if at all. In other words, split state znodes are watched from server and client side. Are there any other things design / architecture wise that are controversial? 2) Implementation (and API impact) This seems like the biggest area of contention right now. The main issue is that the API changes still give the impression of two state tracking formats, whereas we really only want one format. The common ground here is that there should be no mention of "external" in any public method or state format for that matter, right? Noble: Assuming we're moving forward with stateFormat == 2 and the unified /clusterstate.json is going away, is it possible to not change any of the existing public methods? In other words, we're changing the internals of where state is kept, so why does that have to impact the public API? If not, let's come up with a plan for each change and how we can minimize impact of this. It seems to me that we need to be more diligent about API impacts of this change and focus on not breaking the public view of cluster state as much as possible. It would be helpful to have a bullet list of API impacts that are needed for this so we don't have to scour the patch looking for all possible changes. 3) Testing I just wanted to mention that we've been doing a fair amount of integration testing with 100's of "external" collections per cluster. So I realize this is a big change but we have been testing this extensively in our QA labs. I only mention this so that others know that have been concentrating on hardening this feature over the past couple of months. Once we sort out the API problems, I'm confident that this approach will be solid. To recap, I see a lot of common ground here and to move forward, we need to move this out to a branch and off trunk where we'll focus on cleaning up the API impacts of this work, support only the split format going forward (with a migration plan for existing installations). We also want to revisit the thinking behind not watching state changes on the client as that wasn't clear in the patch to this point.
          Hide
          Mark Miller added a comment -

          Thanks for chiming in.

          Do we intend to support both formats or do we intend to just migrate to the split approach?

          We intend to migrate - I think supporting both would be a terrible long term burden. The way the API's are currently designed, this was obviously not taken into account. It's designed as if both things are first class.

          Are there any other things design / architecture wise that are controversial?

          I think that mostly captures it.

          This seems like the biggest area of contention right now.

          Yup - this is what makes me insist on backing this out for now, rather than pushing forward.

          The main issue is that the API changes still give the impression of two state tracking formats, whereas we really only want one format.

          That is part of it, but I also think the API's are somewhat poorly designed from an overall architecture point - it feels to me like shortcuts around more difficult issues to make nice and I think that is the wrong approach. I've got a variety of other issues, for instance, I think it's an abomination the way ZkStateReader has been worked into ClusterState, and I think the documentation is much too weak in areas that would call for it.

          The common ground here is that there should be no mention of "external" in any public method or state format for that matter, right?

          Right. We should have the new format, and the old one is around for back compat. There is no reason to name things in this case. We are pushing forward the architecture, not adding a feature.

          In other words, we're changing the internals of where state is kept, so why does that have to impact the public API? I

          That was also my thought - and I think it probably gets sticky due to supporting both formats per collection, but I think we have to tackle those hard problems.

          It seems to me that we need to be more diligent about API impacts of this change and focus on not breaking the public view of cluster state as much as possible.

          I agree - though it's too early to really hold to that. I do think we should be better about labeling unstable API's though. We can't lock our selves in to early though - it would block a lot of progress and we tend to be loose with Solr java API's - it's standard that you may have to recompile. We should be sensible though. However, I think a user would have a very terrible time migrating to the new API's. I'm hoping we can back them out before it becomes too much harder to do so.

          To recap, I see a lot of common ground here and to move forward, we need to move this out to a branch and off trunk where we'll focus on cleaning up the API impacts of this work, support only the split format going forward (with a migration plan for existing installations).

          +1

          Thanks Timothy Potter, I am on the same page as you with all of that.

          Show
          Mark Miller added a comment - Thanks for chiming in. Do we intend to support both formats or do we intend to just migrate to the split approach? We intend to migrate - I think supporting both would be a terrible long term burden. The way the API's are currently designed, this was obviously not taken into account. It's designed as if both things are first class. Are there any other things design / architecture wise that are controversial? I think that mostly captures it. This seems like the biggest area of contention right now. Yup - this is what makes me insist on backing this out for now, rather than pushing forward. The main issue is that the API changes still give the impression of two state tracking formats, whereas we really only want one format. That is part of it, but I also think the API's are somewhat poorly designed from an overall architecture point - it feels to me like shortcuts around more difficult issues to make nice and I think that is the wrong approach. I've got a variety of other issues, for instance, I think it's an abomination the way ZkStateReader has been worked into ClusterState, and I think the documentation is much too weak in areas that would call for it. The common ground here is that there should be no mention of "external" in any public method or state format for that matter, right? Right. We should have the new format, and the old one is around for back compat. There is no reason to name things in this case. We are pushing forward the architecture, not adding a feature. In other words, we're changing the internals of where state is kept, so why does that have to impact the public API? I That was also my thought - and I think it probably gets sticky due to supporting both formats per collection, but I think we have to tackle those hard problems. It seems to me that we need to be more diligent about API impacts of this change and focus on not breaking the public view of cluster state as much as possible. I agree - though it's too early to really hold to that. I do think we should be better about labeling unstable API's though. We can't lock our selves in to early though - it would block a lot of progress and we tend to be loose with Solr java API's - it's standard that you may have to recompile. We should be sensible though. However, I think a user would have a very terrible time migrating to the new API's. I'm hoping we can back them out before it becomes too much harder to do so. To recap, I see a lot of common ground here and to move forward, we need to move this out to a branch and off trunk where we'll focus on cleaning up the API impacts of this work, support only the split format going forward (with a migration plan for existing installations). +1 Thanks Timothy Potter , I am on the same page as you with all of that.
          Hide
          Mark Miller added a comment -

          Any help needed on pulling this out? I think if we take too long, it's likely to get quite tricky fast.

          Show
          Mark Miller added a comment - Any help needed on pulling this out? I think if we take too long, it's likely to get quite tricky fast.
          Hide
          Noble Paul added a comment -

          I'm almost there . probably today or in the worst case, tomorrow

          Show
          Noble Paul added a comment - I'm almost there . probably today or in the worst case, tomorrow
          Hide
          Noble Paul added a comment -

          patch Undoing the commit

          I'm going to commit this straight away if there is no further comment

          Where r we going to create the new branch? SVN ? git?

          Show
          Noble Paul added a comment - patch Undoing the commit I'm going to commit this straight away if there is no further comment Where r we going to create the new branch? SVN ? git?
          Hide
          Steve Rowe added a comment -

          Where r we going to create the new branch? SVN ? git?

          Lucene/Solr is developed on SVN. Make a new branch named SOLR-5473 (or solr5473) under https://svn.apache.org/repos/asf/lucene/dev/branches/

          Show
          Steve Rowe added a comment - Where r we going to create the new branch? SVN ? git? Lucene/Solr is developed on SVN. Make a new branch named SOLR-5473 (or solr5473) under https://svn.apache.org/repos/asf/lucene/dev/branches/
          Hide
          ASF subversion and git services added a comment -

          Commit 1590983 from Noble Paul in branch 'dev/branches/solr5473'
          [ https://svn.apache.org/r1590983 ]

          Creating a branch to sort out SOLR-5473 , SOLR-5474

          Show
          ASF subversion and git services added a comment - Commit 1590983 from Noble Paul in branch 'dev/branches/solr5473' [ https://svn.apache.org/r1590983 ] Creating a branch to sort out SOLR-5473 , SOLR-5474
          Hide
          Timothy Potter added a comment -

          Hi Mark,

          Just wanted to get your input on the watcher approach I described above:

          In terms of what's watched and what is not watched, this patch includes code from 5474 (as they were too intimately tied together to keep separated) which doesn't watch collection state changes on the client side. Instead the client relies on a stateVer check during request processing and receives an error from the server if the client state is stale. I too think this is a little controversial / confusing and maybe we don't have to keep that as part of this solution. It was our mistake to merge those two into a single patch. We originally were thinking 5474 was needed to keep the number of watchers on a znode to a minimum in the event of many clients using many collections. However, I do think this feature can be split out and dealt with in a better way, if at all. In other words, split state znodes are watched from server and client side.

          Basically, our approach is that each core on the cluster-side only watches the state znode for the collection it is participating in and on the client-side, the CloudSolrServer is not watching any state znodes and instead rely on cached DocCollections and stale state checks on the server side when processing client requests. Do you have any concerns with us moving forward with this approach? Alternatively, the CloudSolrServer on the client-side can use watchers on state znodes after dynamically fetching them from the server when a request for a new collection comes in.

          Show
          Timothy Potter added a comment - Hi Mark, Just wanted to get your input on the watcher approach I described above: In terms of what's watched and what is not watched, this patch includes code from 5474 (as they were too intimately tied together to keep separated) which doesn't watch collection state changes on the client side. Instead the client relies on a stateVer check during request processing and receives an error from the server if the client state is stale. I too think this is a little controversial / confusing and maybe we don't have to keep that as part of this solution. It was our mistake to merge those two into a single patch. We originally were thinking 5474 was needed to keep the number of watchers on a znode to a minimum in the event of many clients using many collections. However, I do think this feature can be split out and dealt with in a better way, if at all. In other words, split state znodes are watched from server and client side. Basically, our approach is that each core on the cluster-side only watches the state znode for the collection it is participating in and on the client-side, the CloudSolrServer is not watching any state znodes and instead rely on cached DocCollections and stale state checks on the server side when processing client requests. Do you have any concerns with us moving forward with this approach? Alternatively, the CloudSolrServer on the client-side can use watchers on state znodes after dynamically fetching them from the server when a request for a new collection comes in.
          Hide
          ASF subversion and git services added a comment -

          Commit 1591253 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1591253 ]

          revert SOLR-5473 , SOLR-5474

          Show
          ASF subversion and git services added a comment - Commit 1591253 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1591253 ] revert SOLR-5473 , SOLR-5474
          Hide
          Noble Paul added a comment -

          This patch exclusively fixes the following things

          1. ​removing the stateFormat variable from clusterstate
          2. renaming the znode name from 'state' to 'state.json'
          3. eliminating the external/internal references from code
          4. Cleaning up the extra methods in ZkStateReader added for external collections
          5. ClouSolrServer retries 5 times instead of 2 times in case of STALE_STATE error
          6. Fix the catch clause of the interrupted exception
          7. Make the 'updateNodes' Map in Overseer to be threadsafe

          Missing pieces and I'm not sure if/how to fix them

          1. ​remove the references of ZkStateReader from ClusterState.java. But I don't think it will be possible. I will be glad to hear how it can be accomplished
            #The exposure of ephemeralCollectionData field in ZkStateReader
          2. Watchers for the states

          This patch applies only on the solr5473 branch

          I would love to hear if anything is missing or anything can be improved

          Show
          Noble Paul added a comment - This patch exclusively fixes the following things ​removing the stateFormat variable from clusterstate renaming the znode name from 'state' to 'state.json' eliminating the external/internal references from code Cleaning up the extra methods in ZkStateReader added for external collections ClouSolrServer retries 5 times instead of 2 times in case of STALE_STATE error Fix the catch clause of the interrupted exception Make the 'updateNodes' Map in Overseer to be threadsafe Missing pieces and I'm not sure if/how to fix them ​remove the references of ZkStateReader from ClusterState.java. But I don't think it will be possible. I will be glad to hear how it can be accomplished #The exposure of ephemeralCollectionData field in ZkStateReader Watchers for the states This patch applies only on the solr5473 branch I would love to hear if anything is missing or anything can be improved
          Show
          ASF subversion and git services added a comment - Commit 1592774 from Noble Paul in branch 'dev/branches/solr5473' [ https://svn.apache.org/r1592774 ] SOLR-5473 https://issues.apache.org/jira/browse/SOLR-5473?focusedCommentId=13990714&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13990714
          Hide
          Mark Miller added a comment -

          Thanks - this is very high on my list to contribute too - I'm out for the rest of the week for a wedding, so I apologize ahead of time if there is some delay, but this is a high priority of mine to give feedback on.

          Show
          Mark Miller added a comment - Thanks - this is very high on my list to contribute too - I'm out for the rest of the week for a wedding, so I apologize ahead of time if there is some delay, but this is a high priority of mine to give feedback on.
          Hide
          Shalin Shekhar Mangar added a comment -

          I found a problem with this patch in my testing. On creating a collection with stateFormat=2, if configName was not specified and multiple configsets are available in ZooKeeper then collection creation fails with a weird error:

          10481348 [Thread-13] ERROR org.apache.solr.cloud.Overseer  – Exception in Overseer main queue loop
          org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/sm0/state
          	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
          	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
          	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
          	at org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:312)
          	at org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:309)
          	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
          	at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:309)
          	at org.apache.solr.cloud.Overseer$ClusterStateUpdater.updateZkStates(Overseer.java:310)
          	at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:267)
          	at java.lang.Thread.run(Thread.java:724)
          

          This seemed to indicate that creating /collections/collection_name/state failed because /collections/collection_name doesn't exist. I found the problem in OverseerCollectionProcessor.createConfNode method which was logging the problem but continuing without failing. This patch makes it fail loudly with the correct messages in such a scenario. There is a related fix in ZkStateReader.readConfigName to fail in case data read from ZK was null or if no configName attribute was found.

          Show
          Shalin Shekhar Mangar added a comment - I found a problem with this patch in my testing. On creating a collection with stateFormat=2, if configName was not specified and multiple configsets are available in ZooKeeper then collection creation fails with a weird error: 10481348 [ Thread -13] ERROR org.apache.solr.cloud.Overseer – Exception in Overseer main queue loop org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/sm0/state at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:312) at org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:309) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:309) at org.apache.solr.cloud.Overseer$ClusterStateUpdater.updateZkStates(Overseer.java:310) at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:267) at java.lang. Thread .run( Thread .java:724) This seemed to indicate that creating /collections/collection_name/state failed because /collections/collection_name doesn't exist. I found the problem in OverseerCollectionProcessor.createConfNode method which was logging the problem but continuing without failing. This patch makes it fail loudly with the correct messages in such a scenario. There is a related fix in ZkStateReader.readConfigName to fail in case data read from ZK was null or if no configName attribute was found.
          Hide
          ASF subversion and git services added a comment -

          Commit 1593130 from shalin@apache.org in branch 'dev/branches/solr5473'
          [ https://svn.apache.org/r1593130 ]

          SOLR-5473: Fix OCP.createConfNode and ZkStateReader.readConfigName to fail loudly if no configs were found

          Show
          ASF subversion and git services added a comment - Commit 1593130 from shalin@apache.org in branch 'dev/branches/solr5473' [ https://svn.apache.org/r1593130 ] SOLR-5473 : Fix OCP.createConfNode and ZkStateReader.readConfigName to fail loudly if no configs were found
          Hide
          Shalin Shekhar Mangar added a comment -

          The following feedback was shared by Tim Potter offline with me. He is on vacation for a while so I'm pasting it here.

          Overall, I like the "watched collection" vs. "external" as it helps make clear that not all collections are being "watched" in a given server.

          1. Is there a better name for ExternalCollectionsTest? Also, the assertion message doesn't match the code anymore:

          + assertTrue("DocCllection#isExternal() must be true", 
          cloudClient.getZkStateReader().getClusterState().getCollection(collectionName).getStateFormat() > 1);
          

          2. In ZkStateReader, not clear what this does:

          Thread.currentThread().interrupt();

          shouldn't you just call: Thread.interrupted() instead?

          3. Still using "external" in some places in CloudSolrServer, such as: requestedExternalCollections
          4. One of the issues Mark raised was keeping ZkStateReader as member of ClusterState. This still seems to be the case.

          Show
          Shalin Shekhar Mangar added a comment - The following feedback was shared by Tim Potter offline with me. He is on vacation for a while so I'm pasting it here. Overall, I like the "watched collection" vs. "external" as it helps make clear that not all collections are being "watched" in a given server. 1. Is there a better name for ExternalCollectionsTest? Also, the assertion message doesn't match the code anymore: + assertTrue( "DocCllection#isExternal() must be true " , cloudClient.getZkStateReader().getClusterState().getCollection(collectionName).getStateFormat() > 1); 2. In ZkStateReader, not clear what this does: Thread .currentThread().interrupt(); shouldn't you just call: Thread.interrupted() instead? 3. Still using "external" in some places in CloudSolrServer, such as: requestedExternalCollections 4. One of the issues Mark raised was keeping ZkStateReader as member of ClusterState. This still seems to be the case.
          Hide
          Noble Paul added a comment -

          Thanks Shalin

          On #1 I have fixed the assertion message

          In ZkStateReader, not clear what this does: Thread.currentThread().interrupt();

          I'm not sure which one is right here. I see Thread.currentThread().interrupt() in other places but I feel it should be Thread.interrupted()

          One of the issues Mark raised was keeping ZkStateReader as member of ClusterState

          The problem that we have is we have a hybrid system now , till we completely move to stateFormat=2. When we move completely to that format , ClusterState.java will be completely redundant. So , till then, the choices are

          1. Eliminate ClusterState.java and use ZkStateReader directly. Which means a non-tricila changes to the entire SolrCloud code
          2. Keep a reference to ZkStateReader and let all the APIs such as getCollection() work , irrespective of where the collection state is kept

          I have opted for the 2nd choice because it was less invasive and we wanted to support the old format for some more time

          I would like to hear more from Mark Miller and others what would be the preferred direction

          Show
          Noble Paul added a comment - Thanks Shalin On #1 I have fixed the assertion message In ZkStateReader, not clear what this does: Thread.currentThread().interrupt(); I'm not sure which one is right here. I see Thread.currentThread().interrupt() in other places but I feel it should be Thread.interrupted() One of the issues Mark raised was keeping ZkStateReader as member of ClusterState The problem that we have is we have a hybrid system now , till we completely move to stateFormat=2. When we move completely to that format , ClusterState.java will be completely redundant. So , till then, the choices are Eliminate ClusterState.java and use ZkStateReader directly. Which means a non-tricila changes to the entire SolrCloud code Keep a reference to ZkStateReader and let all the APIs such as getCollection() work , irrespective of where the collection state is kept I have opted for the 2nd choice because it was less invasive and we wanted to support the old format for some more time I would like to hear more from Mark Miller and others what would be the preferred direction
          Hide
          Jessica Cheng Mallet added a comment -

          I'm not sure which one is right here. I see Thread.currentThread().interrupt() in other places but I feel it should be Thread.interrupted()

          Thread.currentThread().interrupt() is the right thing to do. This blog entry gives a brief explanation: http://michaelscharf.blogspot.com/2006/09/dont-swallow-interruptedexception-call.html.

          Show
          Jessica Cheng Mallet added a comment - I'm not sure which one is right here. I see Thread.currentThread().interrupt() in other places but I feel it should be Thread.interrupted() Thread.currentThread().interrupt() is the right thing to do. This blog entry gives a brief explanation: http://michaelscharf.blogspot.com/2006/09/dont-swallow-interruptedexception-call.html .
          Hide
          Mark Miller added a comment -

          Overall, I like the "watched collection" vs. "external" as it helps make clear that not all collections are being "watched" in a given server.

          If we want to make such a distinction in the code, I don't think it makes any sense to use external still. Something along the lines of "watched collection" and "unwatched collection" should be fine

          Show
          Mark Miller added a comment - Overall, I like the "watched collection" vs. "external" as it helps make clear that not all collections are being "watched" in a given server. If we want to make such a distinction in the code, I don't think it makes any sense to use external still. Something along the lines of "watched collection" and "unwatched collection" should be fine
          Hide
          Shalin Shekhar Mangar added a comment -

          I'm not sure which one is right here. I see Thread.currentThread().interrupt() in other places but I feel it should be Thread.interrupted()

          As Jessica said, Thread.currentThread().interrupt() is correct because it will interrupt the calling method whereas Thread.interrupted() just returns whether the thread was interrupted and clears the interrupted flag (which has already been cleared).

          Show
          Shalin Shekhar Mangar added a comment - I'm not sure which one is right here. I see Thread.currentThread().interrupt() in other places but I feel it should be Thread.interrupted() As Jessica said, Thread.currentThread().interrupt() is correct because it will interrupt the calling method whereas Thread.interrupted() just returns whether the thread was interrupted and clears the interrupted flag (which has already been cleared).
          Hide
          Noble Paul added a comment -

          If we want to make such a distinction in the code,

          The "external collection" references are gone but for some internal method level variables.

          Show
          Noble Paul added a comment - If we want to make such a distinction in the code, The "external collection" references are gone but for some internal method level variables.
          Hide
          ASF subversion and git services added a comment -

          Commit 1603938 from Noble Paul in branch 'dev/branches/solr-5473'
          [ https://svn.apache.org/r1603938 ]

          latest patch applied to trunk for SOLR-5473 SOLR-5474

          Show
          ASF subversion and git services added a comment - Commit 1603938 from Noble Paul in branch 'dev/branches/solr-5473' [ https://svn.apache.org/r1603938 ] latest patch applied to trunk for SOLR-5473 SOLR-5474
          Hide
          Noble Paul added a comment - - edited

          Patch updated to trunk. Incorporating most of the comments

          1. All external references are eliminated from the APIs
          2. the node is given a suffix as /state.json instead of "/state"
          3. removed the redundant attribute externla/stateVersion from the state object. The version is automatically derived from the znode from which the object is read
          4. Thread-safety issues addressed
          5. Added javadocs

          (and many more other subtle cleanups)

          The comments which are not addressed are

          1. The selective watching of collection nodes by solr nodes. There are ony 3 choices when it comes to watching states
            • Watch all nodes : this will would be equivalent or worse than the current clusterstate.json solution. All nodes will be notified of each state change (multiple times, one per collection where it is a member of )
            • Watch none. Just fetch the state data just in time (will kil the ZK) or cache , means the node will not have an updated state to make the right decision at the right time
            • Watch selectively. This is the approach we have taken here
          2. maintaining the zkStateReader reference in clusterstate. Agreed that is not elegant. The ideal solution would be to completely get rid of ClusterState.java because that node is going to go away. and we will only hava ZkStateReader and DocCollection and nothing in between. The problem is we have clusterstate.json now and it will exist there for a at least a couple of releases . So , I am torn between the choices and I decided to go with the not so elegant choice of ClusterState keeping a reference to ZkStatereade , so that all APIs work fine . My suggestion is to eliminate CLusterState.java when we deprecate the old format
          3. The ephemeralCollectionData data in ZkStateReader. This is again not so elegant. This one is simple and performant and have minimal impact of the code.I'm happy to hear any other simpler ideas to make it better.

          We have done extensive testing on this patch internally with very large clusters (120+ nodes ) and very large non:of collections (1000+ of collections). The solr-5473 branch already has this code committed .

          If there are no objections I plan to commit this fairly soon

          Show
          Noble Paul added a comment - - edited Patch updated to trunk. Incorporating most of the comments All external references are eliminated from the APIs the node is given a suffix as /state.json instead of "/state" removed the redundant attribute externla/stateVersion from the state object. The version is automatically derived from the znode from which the object is read Thread-safety issues addressed Added javadocs (and many more other subtle cleanups) The comments which are not addressed are The selective watching of collection nodes by solr nodes. There are ony 3 choices when it comes to watching states Watch all nodes : this will would be equivalent or worse than the current clusterstate.json solution. All nodes will be notified of each state change (multiple times, one per collection where it is a member of ) Watch none. Just fetch the state data just in time (will kil the ZK) or cache , means the node will not have an updated state to make the right decision at the right time Watch selectively. This is the approach we have taken here maintaining the zkStateReader reference in clusterstate. Agreed that is not elegant. The ideal solution would be to completely get rid of ClusterState.java because that node is going to go away. and we will only hava ZkStateReader and DocCollection and nothing in between. The problem is we have clusterstate.json now and it will exist there for a at least a couple of releases . So , I am torn between the choices and I decided to go with the not so elegant choice of ClusterState keeping a reference to ZkStatereade , so that all APIs work fine . My suggestion is to eliminate CLusterState.java when we deprecate the old format The ephemeralCollectionData data in ZkStateReader. This is again not so elegant. This one is simple and performant and have minimal impact of the code.I'm happy to hear any other simpler ideas to make it better. We have done extensive testing on this patch internally with very large clusters (120+ nodes ) and very large non:of collections (1000+ of collections). The solr-5473 branch already has this code committed . If there are no objections I plan to commit this fairly soon
          Hide
          Noble Paul added a comment -

          updated to trunk

          Show
          Noble Paul added a comment - updated to trunk
          Hide
          Noble Paul added a comment -

          last patch was wrong

          Show
          Noble Paul added a comment - last patch was wrong
          Hide
          Timothy Potter added a comment -

          Applied patch to trunk and ran through the tests. I'm not sure these are related but the following unit tests are failing for me:

          [junit4] Tests with failures:
          [junit4] - org.apache.solr.cloud.TestCollectionAPI.testDistribSearch
          [junit4] - org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch
          [junit4] - org.apache.solr.cloud.OverseerTest.testOverseerFailure

          A few minor points about the latest patch:

          CloudSolrServer, line 233 - should use the static STATE_VERSION var here
          CloudSolrServer, line 610 - should rename the requestedExternalCollections local var to something like requestedCollections since the concept of external doesn't apply anymore

          ZkStateReader - wondering if we can't just have watchedCollectionStates with proper handling of null entry value instead of maintaining watchedCollections and watchedCollectionStates? Seems like we could just use the key set of watchedCollectionStates in place of watchedCollections, but maybe that is too subtle?

          ZkStateReader, line 273 - might want to make this a debug vs. info message? Seems like that could happen often and really isn't a big deal.

          ZkStateReader, line 463 - Log message should not mention external - log.warn("Error checking external collections", e);

          ZkStateReader, line 485 - I'm seeing a ton of this message "Updating cloud state from ZooKeeper... " in my logs when using many collections now ... and it's virtually worthless message without mention of which collection's state is involved. Of course we don't know which collection this message applies to in this method, so maybe just remove this log message and make sure we log that the state was updated from ZooKeeper when we do have the collection name handy, which I think you've already done on line 849.

          ZkStateReader, line 761 - Would it be better to hide ephemeralCollectionData behind a public method? ZkStateReader is very much exposed on the SolrJ side of things so my thinking is to use a public method with some JavaDoc explaining that this should not be used by client applications.

          ZookeeperInfoServlet -
          I've overhauled the formerly known as external collection support in ZookeeperInfoServlet as part of SOLR-5810 and am just waiting for this issue to get committed and then I can commit that, so I'm wondering if you need to include the updates to ZookeeperInfoServlet in this patch? The "external" collections aren't showing up correctly in this patch anyway because of line 399: String collStatePath = String.format(Locale.ROOT, "/collections/%s/state", collection);

          SolrZkClient - looks like the instance counting stuff was just there for debugging? Are they still needed ...
          static int instcount = 0;
          public int inst = instcount++;

          Do we want to deprecate ClusterState.java?

          Lastly, most of the above is pretty minor, anything else that stands in the way of this getting committed to trunk?

          Show
          Timothy Potter added a comment - Applied patch to trunk and ran through the tests. I'm not sure these are related but the following unit tests are failing for me: [junit4] Tests with failures: [junit4] - org.apache.solr.cloud.TestCollectionAPI.testDistribSearch [junit4] - org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch [junit4] - org.apache.solr.cloud.OverseerTest.testOverseerFailure A few minor points about the latest patch: CloudSolrServer, line 233 - should use the static STATE_VERSION var here CloudSolrServer, line 610 - should rename the requestedExternalCollections local var to something like requestedCollections since the concept of external doesn't apply anymore ZkStateReader - wondering if we can't just have watchedCollectionStates with proper handling of null entry value instead of maintaining watchedCollections and watchedCollectionStates? Seems like we could just use the key set of watchedCollectionStates in place of watchedCollections, but maybe that is too subtle? ZkStateReader, line 273 - might want to make this a debug vs. info message? Seems like that could happen often and really isn't a big deal. ZkStateReader, line 463 - Log message should not mention external - log.warn("Error checking external collections", e); ZkStateReader, line 485 - I'm seeing a ton of this message "Updating cloud state from ZooKeeper... " in my logs when using many collections now ... and it's virtually worthless message without mention of which collection's state is involved. Of course we don't know which collection this message applies to in this method, so maybe just remove this log message and make sure we log that the state was updated from ZooKeeper when we do have the collection name handy, which I think you've already done on line 849. ZkStateReader, line 761 - Would it be better to hide ephemeralCollectionData behind a public method? ZkStateReader is very much exposed on the SolrJ side of things so my thinking is to use a public method with some JavaDoc explaining that this should not be used by client applications. ZookeeperInfoServlet - I've overhauled the formerly known as external collection support in ZookeeperInfoServlet as part of SOLR-5810 and am just waiting for this issue to get committed and then I can commit that, so I'm wondering if you need to include the updates to ZookeeperInfoServlet in this patch? The "external" collections aren't showing up correctly in this patch anyway because of line 399: String collStatePath = String.format(Locale.ROOT, "/collections/%s/state", collection); SolrZkClient - looks like the instance counting stuff was just there for debugging? Are they still needed ... static int instcount = 0; public int inst = instcount++; Do we want to deprecate ClusterState.java? Lastly, most of the above is pretty minor, anything else that stands in the way of this getting committed to trunk?
          Hide
          Noble Paul added a comment - - edited

          Incorporating Tim's feedback comments

          What is not done

          1. watchedcollectionStates and watchedCollections . One is to say what are the collections that should be watched and the other is to store the actual state. It is still possible to merge these together, but just that I thought this would be cleaner
          2. Deprecating ClusterState can be done when we arrive at a consensus on how we go forward
          3. SOR-5810 :Let's comimt this first and commit SOLR-5810. Now this shows all collections in the single view
            I don't think there is anything that is left to be fixed and if there are no more comments I'll just do some cleanup and commit this
          Show
          Noble Paul added a comment - - edited Incorporating Tim's feedback comments What is not done watchedcollectionStates and watchedCollections . One is to say what are the collections that should be watched and the other is to store the actual state. It is still possible to merge these together, but just that I thought this would be cleaner Deprecating ClusterState can be done when we arrive at a consensus on how we go forward SOR-5810 :Let's comimt this first and commit SOLR-5810 . Now this shows all collections in the single view I don't think there is anything that is left to be fixed and if there are no more comments I'll just do some cleanup and commit this
          Hide
          Shalin Shekhar Mangar added a comment -

          +1

          There has been a huge amount of testing done in-house with this patch applied. I think it is time we bring this back on trunk.

          Show
          Shalin Shekhar Mangar added a comment - +1 There has been a huge amount of testing done in-house with this patch applied. I think it is time we bring this back on trunk.
          Hide
          Shawn Heisey added a comment -

          I have not looked at the code or what this actually does inside the zookeeper database, but I would imagine that the database does get a little bit bigger. The default ZK database limit of 1MB can already be exceeded by only a few hundred collections ... with this modification, that limit will be exceeded with fewer collections. IMHO, the docs need to point this out.

          Show
          Shawn Heisey added a comment - I have not looked at the code or what this actually does inside the zookeeper database, but I would imagine that the database does get a little bit bigger. The default ZK database limit of 1MB can already be exceeded by only a few hundred collections ... with this modification, that limit will be exceeded with fewer collections. IMHO, the docs need to point this out.
          Hide
          Yago Riveiro added a comment -

          Shawn Heisey I can wrong, but I think that the 1MB limit is for znode and not for ZK database.

          Show
          Yago Riveiro added a comment - Shawn Heisey I can wrong, but I think that the 1MB limit is for znode and not for ZK database.
          Hide
          Noble Paul added a comment -

          fixed the admin UI

          Show
          Noble Paul added a comment - fixed the admin UI
          Hide
          Noble Paul added a comment -

          I'm planning to commit this to trunk in a day

          Show
          Noble Paul added a comment - I'm planning to commit this to trunk in a day
          Hide
          ASF subversion and git services added a comment -

          Commit 1607418 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1607418 ]

          SOLR-5473 one state.json per collection , SOLR-5474 support for one json per collection

          Show
          ASF subversion and git services added a comment - Commit 1607418 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1607418 ] SOLR-5473 one state.json per collection , SOLR-5474 support for one json per collection
          Hide
          Noble Paul added a comment -

          The latest patch that is committed

          Show
          Noble Paul added a comment - The latest patch that is committed
          Hide
          Mark Miller added a comment -

          I see you addressed very little of what has come up. Not even the basic issues like a test named External and the horrendous ZkStateReader in clusterstate.json. This is also still not "Make one state.json per collection", but a bunch of issues all connected to that.

          I maintain my -1 on this commit and re urge that this get's moved to a branch.

          There are so many ugly changes in here, I wish some others would review it closely as well.

          I apologize I was not able to provide feedback sooner. Personal life and work has been a zoo lately.

          Given that I can't contribute much in the near term, I can only ask so much I guess. But a minimum, I have a firm negative -1 on the ZkStateReader being part of the cluster state.

          Show
          Mark Miller added a comment - I see you addressed very little of what has come up. Not even the basic issues like a test named External and the horrendous ZkStateReader in clusterstate.json. This is also still not "Make one state.json per collection", but a bunch of issues all connected to that. I maintain my -1 on this commit and re urge that this get's moved to a branch. There are so many ugly changes in here, I wish some others would review it closely as well. I apologize I was not able to provide feedback sooner. Personal life and work has been a zoo lately. Given that I can't contribute much in the near term, I can only ask so much I guess. But a minimum, I have a firm negative -1 on the ZkStateReader being part of the cluster state.
          Hide
          Mark Miller added a comment -

          I already talked about a lot of the problems that still existed, so i won't go overall all of them again. But there are lots of odd things, for example:

          + /** 
          + * this is only set by Overseer not to be set by others and only set inside the Overseer node. If Overseer has 
          + unfinished external collections which are yet to be persisted to ZK 
          + this map is populated and this class can use that information 
          + @param map The map reference 
          + */ 
          + public void setEphemeralCollectionData(Map map){ 
          + ephemeralCollectionData = map; 
          + } 
          

          That is pretty ugly, it still talks about external collections and it's pretty cryptic in general. It appears to basically say, this is a hack for the overseer on the very user centric class. We can't just keep shoving in shortcuts and ugliness. You keep doing that and all the API's will fall under the weight. This patch adds so much complexity and API pain IMO. I'm fine with getting the end result done, but I'm not fine with not taking the time to get the code and api's and tests right for it.

          Most of the changes you listed above do not cover the majority of the problems I've talked about above.

          Show
          Mark Miller added a comment - I already talked about a lot of the problems that still existed, so i won't go overall all of them again. But there are lots of odd things, for example: + /** + * this is only set by Overseer not to be set by others and only set inside the Overseer node. If Overseer has + unfinished external collections which are yet to be persisted to ZK + this map is populated and this class can use that information + @param map The map reference + */ + public void setEphemeralCollectionData(Map map){ + ephemeralCollectionData = map; + } That is pretty ugly, it still talks about external collections and it's pretty cryptic in general. It appears to basically say, this is a hack for the overseer on the very user centric class. We can't just keep shoving in shortcuts and ugliness. You keep doing that and all the API's will fall under the weight. This patch adds so much complexity and API pain IMO. I'm fine with getting the end result done, but I'm not fine with not taking the time to get the code and api's and tests right for it. Most of the changes you listed above do not cover the majority of the problems I've talked about above.
          Hide
          Noble Paul added a comment -

          But a minimum, I have a firm negative -1 on the ZkStateReader being part of the cluster state.

          As I said , the objective has to be to eliminate ClusterState.java altogether because , clusterstate.json will not exist. But, till it exists , I admit , I did not have a better choice.

          it still talks about external collections and it's pretty cryptic in general

          I did miss that , I shall rectify it

          Show
          Noble Paul added a comment - But a minimum, I have a firm negative -1 on the ZkStateReader being part of the cluster state. As I said , the objective has to be to eliminate ClusterState.java altogether because , clusterstate.json will not exist. But, till it exists , I admit , I did not have a better choice. it still talks about external collections and it's pretty cryptic in general I did miss that , I shall rectify it
          Hide
          Mark Miller added a comment -

          But, till it exists , I admit , I did not have a better choice.

          IMO, there is no consensus on this patch, so the current commit is not a choice - I maintain my official -1 veto and would like you to revert the commit.

          Show
          Mark Miller added a comment - But, till it exists , I admit , I did not have a better choice. IMO, there is no consensus on this patch, so the current commit is not a choice - I maintain my official -1 veto and would like you to revert the commit.
          Hide
          Mark Miller added a comment -

          Some things are better now.

          Some of the most important things are not though:

          • Still a bunch of references to external collections.
          • The ClusterState/ZkStateReader API/Impl Changes - I think this has to be done cleanly - almost nothing has changed from the first patch. These API changes are my largest objection.
          • Since we don't plan on supporting both modes forever (it's horribly limiting and confusing), we should implement with that in mind. What is getting deprecated, what are the new API's? The new API's should be good.
          • Tests seem minimal for such a large change. Sounds like you guys have done a lot of external testing, but would have been nice to spend some of that time beefing up the current tests in the areas that changed.
          • By tying the other stuff you are doing into "Make one state.json per collection", you are making it hard to think about and implement the latter cleanly IMO.
          • What was committed just seems like a slightly modified version of the original vetoed commit. There are some improvements to some of the minor issues, but the overall code and approach and changes remain.
          Show
          Mark Miller added a comment - Some things are better now. Some of the most important things are not though: Still a bunch of references to external collections. The ClusterState/ZkStateReader API/Impl Changes - I think this has to be done cleanly - almost nothing has changed from the first patch. These API changes are my largest objection. Since we don't plan on supporting both modes forever (it's horribly limiting and confusing), we should implement with that in mind. What is getting deprecated, what are the new API's? The new API's should be good. Tests seem minimal for such a large change. Sounds like you guys have done a lot of external testing, but would have been nice to spend some of that time beefing up the current tests in the areas that changed. By tying the other stuff you are doing into "Make one state.json per collection", you are making it hard to think about and implement the latter cleanly IMO. What was committed just seems like a slightly modified version of the original vetoed commit. There are some improvements to some of the minor issues, but the overall code and approach and changes remain.
          Hide
          Noble Paul added a comment -

          Since we don't plan on supporting both modes forever (it's horribly limiting and confusing), we should implement with that in mind.

          As I said earlier , We need to get rid of ClusterState. But the problem is for at least a couple of releases , we need both to co-exist . The first step is to deprecate ClusterState. We can mark it as deprecated and refer to methods in ZkStateReader.

          Tests seem minimal for such a large change

          Almost every solrcloud tests randomly switches between new/old mode . We have done extensive long running tests with very large clusters (1000 collections, 120 nodes) and very long running performance tests

          By tying the other stuff you are doing into "Make one state.json per collection",

          The point is , I started with one state.json. And soon realized that without all these other changes it is impossible to do it

          Show
          Noble Paul added a comment - Since we don't plan on supporting both modes forever (it's horribly limiting and confusing), we should implement with that in mind. As I said earlier , We need to get rid of ClusterState. But the problem is for at least a couple of releases , we need both to co-exist . The first step is to deprecate ClusterState. We can mark it as deprecated and refer to methods in ZkStateReader. Tests seem minimal for such a large change Almost every solrcloud tests randomly switches between new/old mode . We have done extensive long running tests with very large clusters (1000 collections, 120 nodes) and very long running performance tests By tying the other stuff you are doing into "Make one state.json per collection", The point is , I started with one state.json. And soon realized that without all these other changes it is impossible to do it
          Hide
          Mark Miller added a comment -

          We have done extensive long running tests with very large clusters

          That only ensures things work today and not tomorrow. Those kinds of tests are important, but just as important are tests that can run again and again over time as we make changes, extend, etc. When you change large core behaviors, it seems crazy there would not be some additional tests added or existing tests expanded more. It's just one point of many I guess. I'm always going to push back on a core change this large if tests are minimally added or expanded. Manual tests ensure things worked that day.

          Almost every solrcloud tests randomly switches between new/old mode

          There is value there, but not as much as you seem to be banking on. SolrCloud is heavily under tested and just because the current tests pass does not mean many things were not broken. Part of why it's nice to add to the tests when adding large core changes.

          The first step is to deprecate ClusterState.

          If we could just deprecate clusterstate now, why would we have to pass a zkstatereader to it?

          It's going to take some significant improvements before I can withdraw my veto.

          Show
          Mark Miller added a comment - We have done extensive long running tests with very large clusters That only ensures things work today and not tomorrow. Those kinds of tests are important, but just as important are tests that can run again and again over time as we make changes, extend, etc. When you change large core behaviors, it seems crazy there would not be some additional tests added or existing tests expanded more. It's just one point of many I guess. I'm always going to push back on a core change this large if tests are minimally added or expanded. Manual tests ensure things worked that day. Almost every solrcloud tests randomly switches between new/old mode There is value there, but not as much as you seem to be banking on. SolrCloud is heavily under tested and just because the current tests pass does not mean many things were not broken. Part of why it's nice to add to the tests when adding large core changes. The first step is to deprecate ClusterState. If we could just deprecate clusterstate now, why would we have to pass a zkstatereader to it? It's going to take some significant improvements before I can withdraw my veto.
          Hide
          Noble Paul added a comment -

          If we could just deprecate clusterstate now, why would we have to pass a zkstatereader to it?

          Even if we deprecate ClusterState , it will exist. If it exists and if the APIs should work as expected it needs a reference to the ZkStateReader.

          Show
          Noble Paul added a comment - If we could just deprecate clusterstate now, why would we have to pass a zkstatereader to it? Even if we deprecate ClusterState , it will exist. If it exists and if the APIs should work as expected it needs a reference to the ZkStateReader.
          Hide
          Shalin Shekhar Mangar added a comment -

          Mark, we have to revert it if your -1 stands. But we have already done it once and it hasn't been very productive because when it comes to your biggest objection of coupling ZkStateReader and ClusterState then neither you nor Noble or I had a suggestion. If we move it to a branch then again we will be in the same place a month down the line since as you say you don't have time to help with it.

          This is also still not "Make one state.json per collection", but a bunch of issues all connected to that.

          Yes but there's no point of have a state per collection without those other changes. Tim wrote very eloquently about what's changed in terms of nodes watching the state and why we think it is necessary.

          Noble said earlier that these hacks in the API were put in place to support back-compat. Having looked at this patch in depth more times than I can remember over the past few months, I agree with Noble that it is difficult to do without it. This API is definitely not the API we want for Solr 5 and I completely agree with you on that. We can refactor and do away with the ClusterState completely on trunk (and we intend to do that in future) but before we that, a back-compatible version of this change needs to land on branch_4x.

          It is crazy that it takes a commit to get your attention (and veto!) when things can be resolved via discussion and collaboration. Tim and I have been reviewing this patch and we shall continue to work with Noble on improving it but I am afraid that it might be unproductive again because after three committers are comfortable with the approach and commit it to trunk, you veto it without any constructive suggestions on actually improving the APIs.

          IMO, we should continue to iterate on trunk for a while (at least we have jenkins there) and get the APIs right as we want them for Solr 5 and then figure out how move it to branch_4x in a back-compatible and hopefully non-ugly way.

          Show
          Shalin Shekhar Mangar added a comment - Mark, we have to revert it if your -1 stands. But we have already done it once and it hasn't been very productive because when it comes to your biggest objection of coupling ZkStateReader and ClusterState then neither you nor Noble or I had a suggestion. If we move it to a branch then again we will be in the same place a month down the line since as you say you don't have time to help with it. This is also still not "Make one state.json per collection", but a bunch of issues all connected to that. Yes but there's no point of have a state per collection without those other changes. Tim wrote very eloquently about what's changed in terms of nodes watching the state and why we think it is necessary. Noble said earlier that these hacks in the API were put in place to support back-compat. Having looked at this patch in depth more times than I can remember over the past few months, I agree with Noble that it is difficult to do without it. This API is definitely not the API we want for Solr 5 and I completely agree with you on that. We can refactor and do away with the ClusterState completely on trunk (and we intend to do that in future) but before we that, a back-compatible version of this change needs to land on branch_4x. It is crazy that it takes a commit to get your attention (and veto!) when things can be resolved via discussion and collaboration. Tim and I have been reviewing this patch and we shall continue to work with Noble on improving it but I am afraid that it might be unproductive again because after three committers are comfortable with the approach and commit it to trunk, you veto it without any constructive suggestions on actually improving the APIs. IMO, we should continue to iterate on trunk for a while (at least we have jenkins there) and get the APIs right as we want them for Solr 5 and then figure out how move it to branch_4x in a back-compatible and hopefully non-ugly way.
          Hide
          Mark Miller added a comment -

          It is crazy that it takes a commit to get your attention

          It's simply when I could push it off no longer. As you have seen, I have not been active in Solr for a while now. It's unfortunate, but it is what it is. Now is when I was able to take a look.

          My point is not simply the ZkStateReader - thats one of the more terrible changes I think, but much worse it was it does to the api's in general, how messy the change is in general, and how it complicates and weakens the code IMO.

          I have given constructive thoughts on why the API's are bad and what needs to be done or what might be an idea to try.

          I can't hold your hand down that entire path though - I have my own work schedule and priorities unfortunately.

          I can honestly say these look like really bad changes to me though, and I have to try and preserve SolrCloud where I can. I see more and more tests failing since I left, Noble has a track record of introducing tests that fail a lot with SolrCloud, and a ton of what I brought up was not addressed at all.

          A lot of SolrCloud issues go in that I don't fully support or think is ready 100%. I'm not trying to be a dick to work with. I do think this work is not very good yet and that putting it in trunk will be a huge pain down the line and so I can't let it happen is my position.

          Show
          Mark Miller added a comment - It is crazy that it takes a commit to get your attention It's simply when I could push it off no longer. As you have seen, I have not been active in Solr for a while now. It's unfortunate, but it is what it is. Now is when I was able to take a look. My point is not simply the ZkStateReader - thats one of the more terrible changes I think, but much worse it was it does to the api's in general, how messy the change is in general, and how it complicates and weakens the code IMO. I have given constructive thoughts on why the API's are bad and what needs to be done or what might be an idea to try. I can't hold your hand down that entire path though - I have my own work schedule and priorities unfortunately. I can honestly say these look like really bad changes to me though, and I have to try and preserve SolrCloud where I can. I see more and more tests failing since I left, Noble has a track record of introducing tests that fail a lot with SolrCloud, and a ton of what I brought up was not addressed at all. A lot of SolrCloud issues go in that I don't fully support or think is ready 100%. I'm not trying to be a dick to work with. I do think this work is not very good yet and that putting it in trunk will be a huge pain down the line and so I can't let it happen is my position.
          Hide
          Mark Miller added a comment -

          I'll also reiterate: Splitting the clusterstate.json is a large, nice, self container issue. If we could do that, I think it would be a lot easier to see how it should be done in a nice way, where it's easy to deprecate and remove an impl later.

          I think because you are trying to tie that issue into this collections scaling issue, you are okay saying that, oh it just has to be ugly and hackey, because it's this whole pile of issues we are solving. I don't think it does have to be ugly or hackey and I think if we let that stay in trunk, it will haunt us like a lot of other code we have sometimes let in too easily.

          Finally, splitting the clusterstate by collection has not been a contentious issue. Some of the other things you are doing around watchers and caching is more contentious. We don't have full agreement on them, we never have, and it really feels like they are coming in as extra pork on a bill with the JIRA title "make one state.json per collection".

          Show
          Mark Miller added a comment - I'll also reiterate: Splitting the clusterstate.json is a large, nice, self container issue. If we could do that, I think it would be a lot easier to see how it should be done in a nice way, where it's easy to deprecate and remove an impl later. I think because you are trying to tie that issue into this collections scaling issue, you are okay saying that, oh it just has to be ugly and hackey, because it's this whole pile of issues we are solving. I don't think it does have to be ugly or hackey and I think if we let that stay in trunk, it will haunt us like a lot of other code we have sometimes let in too easily. Finally, splitting the clusterstate by collection has not been a contentious issue. Some of the other things you are doing around watchers and caching is more contentious. We don't have full agreement on them, we never have, and it really feels like they are coming in as extra pork on a bill with the JIRA title "make one state.json per collection".
          Hide
          Noble Paul added a comment -

          My point is not simply the ZkStateReader - thats one of the more terrible changes I think, but much worse it was it does to the api's in general, how messy the change is in general, and how it complicates and weakens the code IMO

          I have been trying to explain why I am doing it this way. I fully know it is not the most elegant way. But , not breaking back compat was a real challenge.

          I have given constructive thoughts on why the API's are bad and what needs to be done or what might be an idea to try.

          I'm willing to explore any suggestions . All other reviewers have been of similar opinion that there are limitations with the way the system works. Introducing this feature in a back-compat way has been quite a challenge . When we are ready to break that , we will have the liberty to clean it up completely. But we have to make the first baby steps somewhere

          and a ton of what I brought up was not addressed at all.

          It is not that I love not to address them. Despite my best attempts I could not find a better way. I have asked others and they could not suggest solutions either

          Show
          Noble Paul added a comment - My point is not simply the ZkStateReader - thats one of the more terrible changes I think, but much worse it was it does to the api's in general, how messy the change is in general, and how it complicates and weakens the code IMO I have been trying to explain why I am doing it this way. I fully know it is not the most elegant way. But , not breaking back compat was a real challenge. I have given constructive thoughts on why the API's are bad and what needs to be done or what might be an idea to try. I'm willing to explore any suggestions . All other reviewers have been of similar opinion that there are limitations with the way the system works. Introducing this feature in a back-compat way has been quite a challenge . When we are ready to break that , we will have the liberty to clean it up completely. But we have to make the first baby steps somewhere and a ton of what I brought up was not addressed at all. It is not that I love not to address them. Despite my best attempts I could not find a better way. I have asked others and they could not suggest solutions either
          Hide
          Noble Paul added a comment - - edited

          Finally, splitting the clusterstate by collection has not been a contentious issue. Some of the other things you are doing around watchers and caching is more contentious

          I would like to hear a solution where you just split the clusterstate and not getting a performance worse of than the single clusterstate.json. it is not worthwhile to split if we are degrading the performance.

          I'm doing it because our users are really facing problems with scalability with large no:of collections.

          Show
          Noble Paul added a comment - - edited Finally, splitting the clusterstate by collection has not been a contentious issue. Some of the other things you are doing around watchers and caching is more contentious I would like to hear a solution where you just split the clusterstate and not getting a performance worse of than the single clusterstate.json. it is not worthwhile to split if we are degrading the performance. I'm doing it because our users are really facing problems with scalability with large no:of collections.
          Hide
          Shalin Shekhar Mangar added a comment -

          I think because you are trying to tie that issue into this collections scaling issue, you are okay saying that, oh it just has to be ugly and hackey, because it's this whole pile of issues we are solving. I don't think it does have to be ugly or hackey and I think if we let that stay in trunk, it will haunt us like a lot of other code we have sometimes let in too easily.

          Finally, splitting the clusterstate by collection has not been a contentious issue. Some of the other things you are doing around watchers and caching is more contentious. We don't have full agreement on them, we never have, and it really feels like they are coming in as extra pork on a bill with the JIRA title "make one state.json per collection".

          As Noble explained earlier, there's really no point in splitting cluster state by collection if we were not trying to scale to a large number of collections. They are the same issue. We aren't doing this because we want to work around ZK size limits for clusterstate.json. We are trying to make large clusters possible which have thousands of collections. There is really no point in splitting the cluster state per collection and multiplying the number of watchers in the system by the number of collections. Even if ZK and SolrCloud scales to that limit and I don't know if it would, it is just wasteful and not required at all.

          Show
          Shalin Shekhar Mangar added a comment - I think because you are trying to tie that issue into this collections scaling issue, you are okay saying that, oh it just has to be ugly and hackey, because it's this whole pile of issues we are solving. I don't think it does have to be ugly or hackey and I think if we let that stay in trunk, it will haunt us like a lot of other code we have sometimes let in too easily. Finally, splitting the clusterstate by collection has not been a contentious issue. Some of the other things you are doing around watchers and caching is more contentious. We don't have full agreement on them, we never have, and it really feels like they are coming in as extra pork on a bill with the JIRA title "make one state.json per collection". As Noble explained earlier, there's really no point in splitting cluster state by collection if we were not trying to scale to a large number of collections. They are the same issue. We aren't doing this because we want to work around ZK size limits for clusterstate.json. We are trying to make large clusters possible which have thousands of collections. There is really no point in splitting the cluster state per collection and multiplying the number of watchers in the system by the number of collections. Even if ZK and SolrCloud scales to that limit and I don't know if it would, it is just wasteful and not required at all.
          Hide
          Mark Miller added a comment -

          There is really no point in splitting the cluster state per collection and multiplying the number of watchers in the system by the number of collections.

          It would be a step along the path you are seeking it seems for starters - a step with consensus.

          Also, I think of course there is a point and there are users that where not very happy when we switched from having a zk node per shard actually. You can have a huge number of nodes and still read changes very quickly. As someone already mentioned, one of the more interesting options would be to be able to define yourself how many nudes that state was split across or to what level. One site I know had a pretty crazy amount of watchers back when we had a design closer to this and it all seemed to work just fine with thousands of shard entries and many more thousands of watchers. All of these different ideas have different tradeoffs. Breaking up by collection is a small improvement and is something that can be built upon. We talked about it way before you guys started working on scaling to many collections. It's not some sort of mega improvement, but it is a step forward and other things you are looking to do can be built on it.

          Show
          Mark Miller added a comment - There is really no point in splitting the cluster state per collection and multiplying the number of watchers in the system by the number of collections. It would be a step along the path you are seeking it seems for starters - a step with consensus. Also, I think of course there is a point and there are users that where not very happy when we switched from having a zk node per shard actually. You can have a huge number of nodes and still read changes very quickly. As someone already mentioned, one of the more interesting options would be to be able to define yourself how many nudes that state was split across or to what level. One site I know had a pretty crazy amount of watchers back when we had a design closer to this and it all seemed to work just fine with thousands of shard entries and many more thousands of watchers. All of these different ideas have different tradeoffs. Breaking up by collection is a small improvement and is something that can be built upon. We talked about it way before you guys started working on scaling to many collections. It's not some sort of mega improvement, but it is a step forward and other things you are looking to do can be built on it.
          Hide
          Mark Miller added a comment -

          Anyway, no one is forcing you to take it a step at a time. It just feels like that's the only way to try and force an approach that is API sane. People are going to begin developing against what goes in right away - building on it, tying it together and up even more. I think that faster we revert it, the easier that is. There is no way I can withdraw my veto. This core part of the code has to be more sensible, not less. The new and old API's have to be cleanly defined. A user of the API's can't be so screwed. What to use, what to avoid, what the hell are these two different modes? What API's are jacked and why?

          How do the API's need to change to get that zkstatereader out of clusterstate? Your telling me that just impossible? I simply don't believe that.

          Show
          Mark Miller added a comment - Anyway, no one is forcing you to take it a step at a time. It just feels like that's the only way to try and force an approach that is API sane. People are going to begin developing against what goes in right away - building on it, tying it together and up even more. I think that faster we revert it, the easier that is. There is no way I can withdraw my veto. This core part of the code has to be more sensible, not less. The new and old API's have to be cleanly defined. A user of the API's can't be so screwed. What to use, what to avoid, what the hell are these two different modes? What API's are jacked and why? How do the API's need to change to get that zkstatereader out of clusterstate? Your telling me that just impossible? I simply don't believe that.
          Hide
          Noble Paul added a comment -

          It would be a step along the path you are seeking it seems for starters - a step with consensus.

          When we clearly know that the selective watching is more scalable, and it is already implemented, why are we trying a less scalable solution? Just because of the fear of the unknown? The point is I raised this question before the first rollback in this ticket about the 3 options we have 1) watch all 2) watch selectively 3) watch none . I didn't get any response for that. And after 4months of development we are in the same place again.

          Show
          Noble Paul added a comment - It would be a step along the path you are seeking it seems for starters - a step with consensus. When we clearly know that the selective watching is more scalable, and it is already implemented, why are we trying a less scalable solution? Just because of the fear of the unknown? The point is I raised this question before the first rollback in this ticket about the 3 options we have 1) watch all 2) watch selectively 3) watch none . I didn't get any response for that. And after 4months of development we are in the same place again.
          Hide
          Noble Paul added a comment - - edited

          How do the API's need to change to get that zkstatereader out of clusterstate? Your telling me that just impossible? I simply don't believe that.

          OK we can change the clusterstate often and , it won't be realtime anymore. Is that OK. I can make that change easily. Are you OK with selective watching?
          I'm soon posting a patch with zkStateReader removed from ClusterState.

          Show
          Noble Paul added a comment - - edited How do the API's need to change to get that zkstatereader out of clusterstate? Your telling me that just impossible? I simply don't believe that. OK we can change the clusterstate often and , it won't be realtime anymore. Is that OK. I can make that change easily. Are you OK with selective watching? I'm soon posting a patch with zkStateReader removed from ClusterState.
          Hide
          Mark Miller added a comment -

          I'm still calling for a revert. People will start committing issues and make taking it out difficult and error prone. I've already had to convert some of my own work to this state and then from it the last time it took a couple days to take it out.

          Removing zkStateReader is a key step, but one of many issues I'm trying to get across. The API's in general are the problem. No user or developer that did not work on this can sanely deal with these changes IMO. The API's are intermingled, there is no to idea that there will be a deprecation and evolution, it's still, as I've said more than once above, implemented simply like an option. There are awkward API's, its not clear what the modes are, where the code that separates the modes is divided, etc, etc.

          I still think this looks early.

          Show
          Mark Miller added a comment - I'm still calling for a revert. People will start committing issues and make taking it out difficult and error prone. I've already had to convert some of my own work to this state and then from it the last time it took a couple days to take it out. Removing zkStateReader is a key step, but one of many issues I'm trying to get across. The API's in general are the problem. No user or developer that did not work on this can sanely deal with these changes IMO. The API's are intermingled, there is no to idea that there will be a deprecation and evolution, it's still, as I've said more than once above, implemented simply like an option. There are awkward API's, its not clear what the modes are, where the code that separates the modes is divided, etc, etc. I still think this looks early.
          Hide
          Mark Miller added a comment - - edited

          Just because of the fear of the unknown?

          For fear of the hellish API's that devs and users will be stuck with. I'm sorry if I was not clear about that. And for fear of what you are affecting that you don't understand. To do this kind of work and not add or mess with many tests looks very scary from where I am sitting. We need more effort on stabilizing and investigating existing test fails, more work on adding tests for missing and changed areas - much more than we need this kind of unstabalizing change when the API's are half baked and detrimental to the code base if people starting building on them.

          The point is I raised this question before

          And I've talked about it before - probably more than once. There are tradeoffs in the approaches. I told you at a minimum that I would certainly be open to have the option to do selective. And that if you wanted to change to selective by default, it seems that you should have to specifically investigate and argue that the change is better by spelling out the nuances of the change, the benefits and the loses, etc. I didn't say it could not be done, I said it did not seem like we came to a consensus.

          It feels like everything we have discussed from the start and I had issues with, you have essentially done it all as you planned anyway, and anytime I object, you go back and do some cosmetic changes and roll back the same thing. You tell me, the big stuff you want changed, too hard, impossible, the small stuff, done and done (with half not done, like all the external mentions still there).

          There is no consensus IMO, not until I'm convinced by more voices that I'm smocking crack, and a vetoed commit cannot be committed again until a veto is withdrawn.

          Show
          Mark Miller added a comment - - edited Just because of the fear of the unknown? For fear of the hellish API's that devs and users will be stuck with. I'm sorry if I was not clear about that. And for fear of what you are affecting that you don't understand. To do this kind of work and not add or mess with many tests looks very scary from where I am sitting. We need more effort on stabilizing and investigating existing test fails, more work on adding tests for missing and changed areas - much more than we need this kind of unstabalizing change when the API's are half baked and detrimental to the code base if people starting building on them. The point is I raised this question before And I've talked about it before - probably more than once. There are tradeoffs in the approaches. I told you at a minimum that I would certainly be open to have the option to do selective. And that if you wanted to change to selective by default, it seems that you should have to specifically investigate and argue that the change is better by spelling out the nuances of the change, the benefits and the loses, etc. I didn't say it could not be done, I said it did not seem like we came to a consensus. It feels like everything we have discussed from the start and I had issues with, you have essentially done it all as you planned anyway, and anytime I object, you go back and do some cosmetic changes and roll back the same thing. You tell me, the big stuff you want changed, too hard, impossible, the small stuff, done and done (with half not done, like all the external mentions still there). There is no consensus IMO, not until I'm convinced by more voices that I'm smocking crack, and a vetoed commit cannot be committed again until a veto is withdrawn.
          Hide
          ASF subversion and git services added a comment -

          Commit 1607587 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1607587 ]

          reverting SOLR-5473 , SOLR-5474

          Show
          ASF subversion and git services added a comment - Commit 1607587 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1607587 ] reverting SOLR-5473 , SOLR-5474
          Hide
          Noble Paul added a comment -

          I have reverted the changes

          We are working on alternative approaches .

          There is no consensus IMO, not until I'm convinced by more voices that I'm smocking crack

          Consensus cannot be reached without feedback and collaboration. I hope next time we will do it better.

          Show
          Noble Paul added a comment - I have reverted the changes We are working on alternative approaches . There is no consensus IMO, not until I'm convinced by more voices that I'm smocking crack Consensus cannot be reached without feedback and collaboration. I hope next time we will do it better.
          Hide
          Mark Miller added a comment -

          I'm personally frustrated as well. I feel like I've been giving feed back on this for a ton of months now and that it hasn't affected anything but variable names and minor cleanup.

          I'm sorry recently was not a good time - I had vacation, work trips, and a lot of work and personal issues stacked on me. I only have the time I have to contribute. It feels like the time I put in early trying to get my concerns in before too much work was underway was ignored anyway though. This is exactly what was planned from day one that I had objections too, just slightly cleaned up.

          I still think these API's need to be done right. I think such minimal effort has gone into that so far, that I don't feel very bad blocking this. All the work has gone into making it work (which is a big step, and I do think from that perspective, it's good stuff), and very little has gone into making the code and API changes sensible, or into expanding the tests in the areas that are being changed.

          I also think this issue is severely mistitled, and there is not a lot of clarity on how you are changing cluster state caching and watchers and what pro's and con's that change has.

          Like I said, a user (and these are user API's to explore the clusterstate too) or a developer will be lost trying to deal with this. I think it needs to be addressed in code, but even if that didn't happen, a ton more could certainly be addressed with comments and documentation that is not.

          Show
          Mark Miller added a comment - I'm personally frustrated as well. I feel like I've been giving feed back on this for a ton of months now and that it hasn't affected anything but variable names and minor cleanup. I'm sorry recently was not a good time - I had vacation, work trips, and a lot of work and personal issues stacked on me. I only have the time I have to contribute. It feels like the time I put in early trying to get my concerns in before too much work was underway was ignored anyway though. This is exactly what was planned from day one that I had objections too, just slightly cleaned up. I still think these API's need to be done right. I think such minimal effort has gone into that so far, that I don't feel very bad blocking this. All the work has gone into making it work (which is a big step, and I do think from that perspective, it's good stuff), and very little has gone into making the code and API changes sensible, or into expanding the tests in the areas that are being changed. I also think this issue is severely mistitled, and there is not a lot of clarity on how you are changing cluster state caching and watchers and what pro's and con's that change has. Like I said, a user (and these are user API's to explore the clusterstate too) or a developer will be lost trying to deal with this. I think it needs to be addressed in code, but even if that didn't happen, a ton more could certainly be addressed with comments and documentation that is not.
          Hide
          Noble Paul added a comment -

          This is not a full patch. Consider it as just pseudo-code for an alternate approach. These are the core pieces of the changes to the APIs. I mean , at the API level there will be no further changes than this POC

          • Selective watches are still done
          • ClusterState has no reference to ZkStateReader.

          I did not want to bog you down with the full patch nor do I want to put in a lot of effort going down the wrong path. Please let me know if you are fine with the approach. If yes, I shall give a full patch shortly

          Show
          Noble Paul added a comment - This is not a full patch. Consider it as just pseudo-code for an alternate approach. These are the core pieces of the changes to the APIs. I mean , at the API level there will be no further changes than this POC Selective watches are still done ClusterState has no reference to ZkStateReader. I did not want to bog you down with the full patch nor do I want to put in a lot of effort going down the wrong path. Please let me know if you are fine with the approach. If yes, I shall give a full patch shortly
          Hide
          Shalin Shekhar Mangar added a comment -

          I think that's better but I don't like:

          /**This is not a public API. Only used by ZkController */
            public void removeZKWatch(final String coll){
              synchronized (this){
                watchedCollections.remove(coll);
                clusterState = clusterState.copyWith(Collections.<String, DocCollection>singletonMap(coll, null));
              }
            }
          

          If it's not supposed to be public API then it shouldn't be public. I think we should change this to an event listener model so that watching/un-watching can be done automatically. Mark Miller - can you please take a look as well?

          Show
          Shalin Shekhar Mangar added a comment - I think that's better but I don't like: /**This is not a public API. Only used by ZkController */ public void removeZKWatch( final String coll){ synchronized ( this ){ watchedCollections.remove(coll); clusterState = clusterState.copyWith(Collections.< String , DocCollection>singletonMap(coll, null )); } } If it's not supposed to be public API then it shouldn't be public. I think we should change this to an event listener model so that watching/un-watching can be done automatically. Mark Miller - can you please take a look as well?
          Hide
          Mark Miller added a comment -

          Let’s forget the code for a minute. I have to review that, but I don’t want to just say the approach looks okay and you come back with a patch an hour later and are ready to commit. I don’t doubt we can come to agreement on how to handle the API.

          I want to talk about cached clusterstate and watchers. Because, apparently, that is really what this issue is, and the title of this issue is an irrelevant implementation detail towards scaling to thousands of collections.

          What are the limitations in the new system? What are the differences with the old system?

          If you are monitoring a cluster of many collections in the Solr admin UI and nodes die and take out collections that are not hosted on the admin UI node, when will you see this reflected by the cloud visualization tab? Ever? If there are new limitations or changes, are they documented?

          Also, if we are going to move to this different core way of doing things, do we yet fully understand all of the changes and implications, or is this just build something to scale to many collections and let the rest fall out as it may? I honestly do not know yet, but if everything is understood, let’s get it all listed in one place here and come to agreement on the path forward.

          When changing such a core part of this system, and I know I’ve said it before, it would be great to expand on the minimal tests we have for a lot of this behavior. It’s part of paying for the change IMO - get your hands dirty on some of the core tests or lack of tests and help fill out the skeleton SolrCloud testing that has been put in place. We are short on unit tests especially.

          And can we change the title of this issue to reflect it’s full scope?

          Show
          Mark Miller added a comment - Let’s forget the code for a minute. I have to review that, but I don’t want to just say the approach looks okay and you come back with a patch an hour later and are ready to commit. I don’t doubt we can come to agreement on how to handle the API. I want to talk about cached clusterstate and watchers. Because, apparently, that is really what this issue is, and the title of this issue is an irrelevant implementation detail towards scaling to thousands of collections. What are the limitations in the new system? What are the differences with the old system? If you are monitoring a cluster of many collections in the Solr admin UI and nodes die and take out collections that are not hosted on the admin UI node, when will you see this reflected by the cloud visualization tab? Ever? If there are new limitations or changes, are they documented? Also, if we are going to move to this different core way of doing things, do we yet fully understand all of the changes and implications, or is this just build something to scale to many collections and let the rest fall out as it may? I honestly do not know yet, but if everything is understood, let’s get it all listed in one place here and come to agreement on the path forward. When changing such a core part of this system, and I know I’ve said it before, it would be great to expand on the minimal tests we have for a lot of this behavior. It’s part of paying for the change IMO - get your hands dirty on some of the core tests or lack of tests and help fill out the skeleton SolrCloud testing that has been put in place. We are short on unit tests especially. And can we change the title of this issue to reflect it’s full scope?
          Hide
          Noble Paul added a comment -

          I don’t doubt we can come to agreement on how to handle the API.

          I know . We just need a few iterations.

          I want to talk about cached clusterstate and watchers. Because, apparently, that is really what this issue is

          I'm glad that we are discussing this. This is the meat of the issue . And it is EXTREMELY IMPORTANT that YOU agree with this aspect of the issue and all the changes involved in that

          If you are monitoring a cluster of many collections in the Solr admin UI and nodes die and take out collections that are not hosted on the admin UI node, when will you see this reflected by the cloud visualization tab?

          All changes will be visible realtime. The point is nodes NEVER cache any states (only Solrj does SOLR-5474). .nodes watch collections where it is a member. Other states are always fetched just in time from ZK.

          Also, if we are going to move to this different core way of doing things, do we yet fully understand all of the changes and implications,

          Yes , the changes are quite simple

          1. Nodes will only watch collections where it is a member of
            • when a core is registered, add a watch for that collection (don't add duplicates )
            • When a core is unregistered, remove the watch . (only if no other core needs that collection)
          2. Don't watch anything else
          3. Hit ZK for any data that is not watched realtime

          And can we change the title of this issue to reflect it’s full scope?

          I shall do it

          I am glad to add more tests . We can open tickets and and target them one by one.

          Show
          Noble Paul added a comment - I don’t doubt we can come to agreement on how to handle the API. I know . We just need a few iterations. I want to talk about cached clusterstate and watchers. Because, apparently, that is really what this issue is I'm glad that we are discussing this. This is the meat of the issue . And it is EXTREMELY IMPORTANT that YOU agree with this aspect of the issue and all the changes involved in that If you are monitoring a cluster of many collections in the Solr admin UI and nodes die and take out collections that are not hosted on the admin UI node, when will you see this reflected by the cloud visualization tab? All changes will be visible realtime. The point is nodes NEVER cache any states (only Solrj does SOLR-5474 ). .nodes watch collections where it is a member. Other states are always fetched just in time from ZK. Also, if we are going to move to this different core way of doing things, do we yet fully understand all of the changes and implications, Yes , the changes are quite simple Nodes will only watch collections where it is a member of when a core is registered, add a watch for that collection (don't add duplicates ) When a core is unregistered, remove the watch . (only if no other core needs that collection) Don't watch anything else Hit ZK for any data that is not watched realtime And can we change the title of this issue to reflect it’s full scope? I shall do it I am glad to add more tests . We can open tickets and and target them one by one.
          Hide
          Noble Paul added a comment -

          Mark Miller If you are fine with the watch stuff, I shall go ahead with a complete patch with the new appoach outlined in the latest patch

          Show
          Noble Paul added a comment - Mark Miller If you are fine with the watch stuff, I shall go ahead with a complete patch with the new appoach outlined in the latest patch
          Hide
          Mark Miller added a comment - - edited

          ClusterState has no reference to ZkStateReader.

          +1 on that part, but it doesn't seem to address much else, so I don't have too much to say.

          All changes will be visible realtime. The point is nodes NEVER cache any states (only Solrj does SOLR-5474). .nodes watch collections where it is a member. Other states are always fetched just in time from ZK.

          It sounds like what I said is an issue? You can easily be on a node in your cluster that doesn't have part of a collection. If you are using the admin UI to view your cluster and a node from another collection goes down, will that reflect on the solr admin UI you are using that doesn't host part of that collection? I think this is a big deal if not, and nothing in the patch addresses this kinds of issues for users or developers. You are telling me all the behavior is the same? I don't believe that yet.

          Show
          Mark Miller added a comment - - edited ClusterState has no reference to ZkStateReader. +1 on that part, but it doesn't seem to address much else, so I don't have too much to say. All changes will be visible realtime. The point is nodes NEVER cache any states (only Solrj does SOLR-5474 ). .nodes watch collections where it is a member. Other states are always fetched just in time from ZK. It sounds like what I said is an issue? You can easily be on a node in your cluster that doesn't have part of a collection. If you are using the admin UI to view your cluster and a node from another collection goes down, will that reflect on the solr admin UI you are using that doesn't host part of that collection? I think this is a big deal if not, and nothing in the patch addresses this kinds of issues for users or developers. You are telling me all the behavior is the same? I don't believe that yet.
          Hide
          Noble Paul added a comment -

          You can easily be on a node in your cluster that doesn't have part of a collection. If you are using the admin UI to view your cluster and a node from another collection goes down, will that reflect on the solr admin UI

          For all collections where this node is not a part of, the states are fetched realtime from ZK. So whatever you see in the admin console will be the latest state. The full patch is already taking care of this

          Show
          Noble Paul added a comment - You can easily be on a node in your cluster that doesn't have part of a collection. If you are using the admin UI to view your cluster and a node from another collection goes down, will that reflect on the solr admin UI For all collections where this node is not a part of, the states are fetched realtime from ZK. So whatever you see in the admin console will be the latest state. The full patch is already taking care of this
          Hide
          Mark Miller added a comment -

          For all collections where this node is not a part of, the states are fetched realtime from ZK.

          Okay, great, that's off the table then.

          So the only real trade off is that a leader might not learn about a state change until a request fails rather than being alerted when the state change happens?

          Show
          Mark Miller added a comment - For all collections where this node is not a part of, the states are fetched realtime from ZK. Okay, great, that's off the table then. So the only real trade off is that a leader might not learn about a state change until a request fails rather than being alerted when the state change happens?
          Hide
          Noble Paul added a comment -

          So the only real trade off is that a leader might not learn about a state change until a request fails rather than being alerted when the state change happens?

          A leader is always a part of a collection and is always notified of state changes.

          OTOH SolrJ will not learn about state changes till it makes a request

          Show
          Noble Paul added a comment - So the only real trade off is that a leader might not learn about a state change until a request fails rather than being alerted when the state change happens? A leader is always a part of a collection and is always notified of state changes. OTOH SolrJ will not learn about state changes till it makes a request
          Hide
          Mark Miller added a comment -

          Okay, now we are getting somewhere.

          So the important sum of these changes can be listed as:

          • Nodes directly talk to zk when dealing with collections they don't host and do not watch those collections in zk.
          • Solrj does not update it's cached cluster state unless a request fails to a node?

          How much does this complicate things for those that try and write a cloud aware client in other languages?

          Show
          Mark Miller added a comment - Okay, now we are getting somewhere. So the important sum of these changes can be listed as: Nodes directly talk to zk when dealing with collections they don't host and do not watch those collections in zk. Solrj does not update it's cached cluster state unless a request fails to a node? How much does this complicate things for those that try and write a cloud aware client in other languages?
          Hide
          Noble Paul added a comment -

          How much does this complicate things for those that try and write a cloud aware client in other languages?

          This is just an optimization in CloudSolrServer to minimize watches. The caching is done inside the CloudSolrServer class. The server is not aware of the cache.

          The only thing server does is handling the stateVer (contains collection name and its znode version as it has in the cache) param sent in request . If the assumption of collection name and version is wrong server throws an error (STALE_STATE). Anyone ther language client that wants to do an intelligent watch must sent this extra request param. Others can choose to have a scheme of their own (like watching all states )

          Show
          Noble Paul added a comment - How much does this complicate things for those that try and write a cloud aware client in other languages? This is just an optimization in CloudSolrServer to minimize watches. The caching is done inside the CloudSolrServer class. The server is not aware of the cache. The only thing server does is handling the stateVer (contains collection name and its znode version as it has in the cache) param sent in request . If the assumption of collection name and version is wrong server throws an error (STALE_STATE). Anyone ther language client that wants to do an intelligent watch must sent this extra request param. Others can choose to have a scheme of their own (like watching all states )
          Hide
          Mark Miller added a comment -

          Anyway, regardless, if all that holds, I'm sold on that as a change then.

          Let's talk about the code.

          Beyond fixing the ZkStateReader in ClusterState issue, I think we need to work more on making it clear what the two modes are in the code, which API's belong to what mode, what our plans are, and how developers should deal with all this. Someone should be able to very quickly get up to speed on all this so they don't break things, tie together API's that shouldn't be, build much on the old mode, etc. Let's make our intentions clear in the code, by the API's we pick and the comments.

          Show
          Mark Miller added a comment - Anyway, regardless, if all that holds, I'm sold on that as a change then. Let's talk about the code. Beyond fixing the ZkStateReader in ClusterState issue, I think we need to work more on making it clear what the two modes are in the code, which API's belong to what mode, what our plans are, and how developers should deal with all this. Someone should be able to very quickly get up to speed on all this so they don't break things, tie together API's that shouldn't be, build much on the old mode, etc. Let's make our intentions clear in the code, by the API's we pick and the comments.
          Hide
          Mark Miller added a comment -

          Why don't you finish the path of removing ZkStateReader and then let me take a pass of suggestions.

          Show
          Mark Miller added a comment - Why don't you finish the path of removing ZkStateReader and then let me take a pass of suggestions.
          Hide
          Noble Paul added a comment -

          Why don't you finish the path of removing ZkStateReader and then let me take a pass of suggestions.

          Yes, this is the best way to iron out the internal APIs. I'll post a full patch shortly based on the new approach.

          Show
          Noble Paul added a comment - Why don't you finish the path of removing ZkStateReader and then let me take a pass of suggestions. Yes, this is the best way to iron out the internal APIs. I'll post a full patch shortly based on the new approach.
          Hide
          Ramkumar Aiyengar added a comment -

          What about nodes which just act as search federators (ie host no data but just distribute searches to shards and collect results back)? There should at least be an option to listen to collections of choice so that you don't have to fetch them for each request.

          Show
          Ramkumar Aiyengar added a comment - What about nodes which just act as search federators (ie host no data but just distribute searches to shards and collect results back)? There should at least be an option to listen to collections of choice so that you don't have to fetch them for each request.
          Hide
          Noble Paul added a comment -

          There should at least be an option to listen to collections of choice so that you don't have to fetch them for each request

          Are you talking about Solr nodes? watching nodes is error prone unless we have clear rules on how to watch/unwatch .

          The strategy should be similar to SolrJ where it watches nothing but caches everything

          SolrDispatchFIlter will be enhanced to use caching similar to CLoudSolrServer

          Show
          Noble Paul added a comment - There should at least be an option to listen to collections of choice so that you don't have to fetch them for each request Are you talking about Solr nodes? watching nodes is error prone unless we have clear rules on how to watch/unwatch . The strategy should be similar to SolrJ where it watches nothing but caches everything SolrDispatchFIlter will be enhanced to use caching similar to CLoudSolrServer
          Hide
          Ramkumar Aiyengar added a comment -

          Got it.

          I wouldn't say I am still convinced that caching till you fail is the same as watching. There are still cases around (unfortunately not very reproduceable enough to be fixed) the cluster state tells you for sure that some node is off limits, but the actual request doesn't fail fast enough. More generally, ideally I would like to be able to inform that for some reason a node shouldn't be used for whatever environmental reason even though it physically is up (may be it is doing something weird and I would like for it to be up and available for debugging while not affecting queries). Currently that's not possible, but something we might work to get added. It would be good to have the true ZK state instead of lazily updating it on error.

          watching nodes is error prone unless we have clear rules on how to watch/unwatch

          That's why I am saying that at least in the simplistic case this should be left to configuration – watch none, all, or selected. That would at least open the doors for more sophisticated logic on making the selection smarter, but we shouldn't shut it out and require only caching to be used. Use the watch when you have been told to, else cache..

          Show
          Ramkumar Aiyengar added a comment - Got it. I wouldn't say I am still convinced that caching till you fail is the same as watching. There are still cases around (unfortunately not very reproduceable enough to be fixed) the cluster state tells you for sure that some node is off limits, but the actual request doesn't fail fast enough. More generally, ideally I would like to be able to inform that for some reason a node shouldn't be used for whatever environmental reason even though it physically is up (may be it is doing something weird and I would like for it to be up and available for debugging while not affecting queries). Currently that's not possible, but something we might work to get added. It would be good to have the true ZK state instead of lazily updating it on error. watching nodes is error prone unless we have clear rules on how to watch/unwatch That's why I am saying that at least in the simplistic case this should be left to configuration – watch none, all, or selected. That would at least open the doors for more sophisticated logic on making the selection smarter, but we shouldn't shut it out and require only caching to be used. Use the watch when you have been told to, else cache..
          Hide
          Mark Miller added a comment -

          I wouldn't say I am still convinced that caching till you fail is the same as watching.

          I don't believe it is either. I like it less where you don't need thousands of collections. Perhaps we consider making it an optional optimization on CloudSolrServer?

          Given the scaling gains for collections though, this issue overall does seem worth any tradeoffs and it seems like improvements and options can be made where appropriate.

          Show
          Mark Miller added a comment - I wouldn't say I am still convinced that caching till you fail is the same as watching. I don't believe it is either. I like it less where you don't need thousands of collections. Perhaps we consider making it an optional optimization on CloudSolrServer? Given the scaling gains for collections though, this issue overall does seem worth any tradeoffs and it seems like improvements and options can be made where appropriate.
          Hide
          Noble Paul added a comment -

          Full patch with all tests passing .

          The changes are minimal with an addition of a constructor to ClusterState

          Show
          Noble Paul added a comment - Full patch with all tests passing . The changes are minimal with an addition of a constructor to ClusterState
          Hide
          Noble Paul added a comment - - edited

          wouldn't say I am still convinced that caching till you fail is the same as watching

          You are right. caching till you fail is just an optimization in CloudSolrServer .

          According to me the client has no business to watch the state at all. The cost of an extra request per stale state is negligible IMHO

          That's why I am saying that at least in the simplistic case this should be left to configuration – watch none, all, or selected.

          Yes, I'm inclined to add this (selective watch) as an option which kicks in only if the no:of collections is greater than a certain threshold (say 10) . In that case all Solr nodes will watch all states.

          To sum it up . My preference is

          1. Have SolrJ do caching till it fails or till it times out (no watching whatsoever). Please enlighten me with a case where it is risky
          2. SolrNodes should choose to watch all or selective based on the no:of collections or a configurable clusterwide property
          Show
          Noble Paul added a comment - - edited wouldn't say I am still convinced that caching till you fail is the same as watching You are right. caching till you fail is just an optimization in CloudSolrServer . According to me the client has no business to watch the state at all. The cost of an extra request per stale state is negligible IMHO That's why I am saying that at least in the simplistic case this should be left to configuration – watch none, all, or selected. Yes, I'm inclined to add this (selective watch) as an option which kicks in only if the no:of collections is greater than a certain threshold (say 10) . In that case all Solr nodes will watch all states. To sum it up . My preference is Have SolrJ do caching till it fails or till it times out (no watching whatsoever). Please enlighten me with a case where it is risky SolrNodes should choose to watch all or selective based on the no:of collections or a configurable clusterwide property
          Hide
          Jessica Cheng Mallet added a comment -

          I think there can be a race condition in CloudSolrServer's state-caching if the state is fetched just when a collection is created but none of the replicas have been added. If this state is cached, then until it expires, all the requests will fail-fast with an empty theUrlList. We may need to optionally skip caching if any of the shards has no replicas at all.

          Show
          Jessica Cheng Mallet added a comment - I think there can be a race condition in CloudSolrServer's state-caching if the state is fetched just when a collection is created but none of the replicas have been added. If this state is cached, then until it expires, all the requests will fail-fast with an empty theUrlList. We may need to optionally skip caching if any of the shards has no replicas at all.
          Hide
          Noble Paul added a comment -

          if theUrlList is empty , it makes sense to invalidate cache and fail

          Show
          Noble Paul added a comment - if theUrlList is empty , it makes sense to invalidate cache and fail
          Hide
          Jessica Cheng Mallet added a comment -

          invalidate cache and fail

          I think it should just behave as if it received an INVALID_STATE in the response and tries to reload the zk state, and retry.

          Show
          Jessica Cheng Mallet added a comment - invalidate cache and fail I think it should just behave as if it received an INVALID_STATE in the response and tries to reload the zk state, and retry.
          Hide
          Noble Paul added a comment -

          Yes, it should be handled like a STALE_STATE error.

          I was about to put up a new patch

          Show
          Noble Paul added a comment - Yes, it should be handled like a STALE_STATE error. I was about to put up a new patch
          Hide
          Noble Paul added a comment -

          Mark Miller Can you look at the latest patch and let me know your comments

          Show
          Noble Paul added a comment - Mark Miller Can you look at the latest patch and let me know your comments
          Hide
          Alan Woodward added a comment -

          I think part of the reason this is so unwieldy is that ClusterState itself is monolithic - you call ZkStateReader.getClusterState() and it goes and gets the state of the entire cluster, and then you typically only need information for a single collection. So ClusterState needs to know about all the different state versions, which bloats it up, and then leaves you with API warts like ZkController being responsible for removing watches.

          What I think should really happen here is that we should add an intermediate layer, CollectionState. This has three implementations, one that is a singleton watching the master clusterstate.json, one that is a separate object with watchers for each 'external' collection, one that just directly fetches data from ZK whenever it's queried. When ZkStateReader starts up (and can we maybe move createClusterStateWatchersAndUpdate() into the constructor?) it works out which CollectionState type it needs for each collection in the cluster. Users of the API just call ZkStateReader.getCollection() and they get the right kind of CollectionState object, no need to have external knowledge of what the state version of the collection is.

          Having stuck my oar in here, I'm now going offline for a couple of weeks Maybe this API change should be a separate issue, but I think it should be nailed down before this one is committed.

          Show
          Alan Woodward added a comment - I think part of the reason this is so unwieldy is that ClusterState itself is monolithic - you call ZkStateReader.getClusterState() and it goes and gets the state of the entire cluster, and then you typically only need information for a single collection. So ClusterState needs to know about all the different state versions, which bloats it up, and then leaves you with API warts like ZkController being responsible for removing watches. What I think should really happen here is that we should add an intermediate layer, CollectionState. This has three implementations, one that is a singleton watching the master clusterstate.json, one that is a separate object with watchers for each 'external' collection, one that just directly fetches data from ZK whenever it's queried. When ZkStateReader starts up (and can we maybe move createClusterStateWatchersAndUpdate() into the constructor?) it works out which CollectionState type it needs for each collection in the cluster. Users of the API just call ZkStateReader.getCollection() and they get the right kind of CollectionState object, no need to have external knowledge of what the state version of the collection is. Having stuck my oar in here, I'm now going offline for a couple of weeks Maybe this API change should be a separate issue, but I think it should be nailed down before this one is committed.
          Hide
          Noble Paul added a comment -

          Alan Woodward Is this comment made after seeing the latst patch or is it based on the old one?

          I have implemented it in more or less the way you suggested

          Show
          Noble Paul added a comment - Alan Woodward Is this comment made after seeing the latst patch or is it based on the old one? I have implemented it in more or less the way you suggested
          Hide
          Alan Woodward added a comment -

          Sure, but there's no abstraction here. Overseer has a bunch of methods that say "if the collection state version is > 1, do this, otherwise do that", when really it shouldn't have to know or care about that. It should get a Collection object from the state reader and tell it to update itself, and then we have different classes that know to either edit the master clusterstate.json or the collection's state.json. And then if we add a third collection type that's implemented in a completely different way, we just add a new Collection implementation, and you don't have to pick through everywhere where you're checking the state version and add a new branch.

          Java gives us classes and interfaces, it makes sense to use them.

          Show
          Alan Woodward added a comment - Sure, but there's no abstraction here. Overseer has a bunch of methods that say "if the collection state version is > 1, do this, otherwise do that", when really it shouldn't have to know or care about that. It should get a Collection object from the state reader and tell it to update itself, and then we have different classes that know to either edit the master clusterstate.json or the collection's state.json. And then if we add a third collection type that's implemented in a completely different way, we just add a new Collection implementation, and you don't have to pick through everywhere where you're checking the state version and add a new branch. Java gives us classes and interfaces, it makes sense to use them.
          Hide
          Shalin Shekhar Mangar added a comment -

          It should get a Collection object from the state reader and tell it to update itself, and then we have different classes that know to either edit the master clusterstate.json or the collection's state.json. And then if we add a third collection type that's implemented in a completely different way, we just add a new Collection implementation, and you don't have to pick through everywhere where you're checking the state version and add a new branch.

          I don't like that idea for three reasons:

          1. If a collection object knows how to update itself, I don't see how batching of state updates can be supported inside Overseer.
          2. We don't plan to have multiple versions or collection types. This will be the only format eventually and we just need to keep this around for 4.x. We shouldn't be designing to optimize for multiple versions.
          3. The ClusterState/DocCollection right now are just data objects and the only place where you need to save them to ZK is Overseer so keeping that logic inside Overseer makes sense to me.

          Therefore I think abstraction isn't going to help if no other component needs it.

          Show
          Shalin Shekhar Mangar added a comment - It should get a Collection object from the state reader and tell it to update itself, and then we have different classes that know to either edit the master clusterstate.json or the collection's state.json. And then if we add a third collection type that's implemented in a completely different way, we just add a new Collection implementation, and you don't have to pick through everywhere where you're checking the state version and add a new branch. I don't like that idea for three reasons: If a collection object knows how to update itself, I don't see how batching of state updates can be supported inside Overseer. We don't plan to have multiple versions or collection types. This will be the only format eventually and we just need to keep this around for 4.x. We shouldn't be designing to optimize for multiple versions. The ClusterState/DocCollection right now are just data objects and the only place where you need to save them to ZK is Overseer so keeping that logic inside Overseer makes sense to me. Therefore I think abstraction isn't going to help if no other component needs it.
          Hide
          Mark Miller added a comment -

          Sure, but there's no abstraction here.

          I largely agree with you. A plan like that would help prevent abuse of whatever we do here. That is my largest concern left - when you put in tmp stuff that is not designed with solid abstractions, it generally leads to bad places as devs leave, fall off, change, other devs start cranking on that code, etc.

          That's why, even if we don't do abstractions because we decide they are too expensive given our future plans or something, we need to document the hell out of in between code to protect it and guide it.

          The nice thing about abstractions is that it's easier to document, its harder for others to screw up, and it's easier to do proper deprecation and removal.

          Show
          Mark Miller added a comment - Sure, but there's no abstraction here. I largely agree with you. A plan like that would help prevent abuse of whatever we do here. That is my largest concern left - when you put in tmp stuff that is not designed with solid abstractions, it generally leads to bad places as devs leave, fall off, change, other devs start cranking on that code, etc. That's why, even if we don't do abstractions because we decide they are too expensive given our future plans or something, we need to document the hell out of in between code to protect it and guide it. The nice thing about abstractions is that it's easier to document, its harder for others to screw up, and it's easier to do proper deprecation and removal.
          Hide
          Noble Paul added a comment -

          Abstractions are definitely good. But here it is an overkill

          There is only one entity which should ever construct and persist a clusterstate (that is the overseer). And going forward we don't need these multiple 'stateFormat' things.

          I don't agree with the idea of a data object being responsible for its persistence. This is a typical ORM thing where the data object offers just a structure and the ORM framework worries about how/where the persistence is done

          Show
          Noble Paul added a comment - Abstractions are definitely good. But here it is an overkill There is only one entity which should ever construct and persist a clusterstate (that is the overseer). And going forward we don't need these multiple 'stateFormat' things. I don't agree with the idea of a data object being responsible for its persistence. This is a typical ORM thing where the data object offers just a structure and the ORM framework worries about how/where the persistence is done
          Hide
          Mark Miller added a comment -

          But here it is an overkill

          I think that may be a fine argument to make, but then I wish you would address: when you put in tmp stuff that is not designed with solid abstractions, it generally leads to bad places as devs leave, fall off, change, other devs start cranking on that code, etc.

          You can read through my comments above for more details, but this has been my main concern. Putting in a bunch of stuff that is hard for new devs to understand, that's half way between two worlds, that counts on the same devs coming back and finishing later, that tends to be a red flag to me.

          The commenting, the plan for deprecation, comments to make new devs able to understand any of this, that seems really lacking. Currently, I think abstractions is the best way to solve a lot of that. I'm willing to be shown it can be done without that though.

          My feeling is that it's currently implemented like one organization or entity owns the code, and not like it's going into a melting pot with an extremely unpredictable future, as it is.

          Show
          Mark Miller added a comment - But here it is an overkill I think that may be a fine argument to make, but then I wish you would address: when you put in tmp stuff that is not designed with solid abstractions, it generally leads to bad places as devs leave, fall off, change, other devs start cranking on that code, etc. You can read through my comments above for more details, but this has been my main concern. Putting in a bunch of stuff that is hard for new devs to understand, that's half way between two worlds, that counts on the same devs coming back and finishing later, that tends to be a red flag to me. The commenting, the plan for deprecation, comments to make new devs able to understand any of this, that seems really lacking. Currently, I think abstractions is the best way to solve a lot of that. I'm willing to be shown it can be done without that though. My feeling is that it's currently implemented like one organization or entity owns the code, and not like it's going into a melting pot with an extremely unpredictable future, as it is.
          Hide
          Noble Paul added a comment -

          My feeling is that it's currently implemented like one organization or entity owns the code

          Is this comment made after looking at the latest patch? The current system does not have too many changes from the old system. I know a couple a places I would need to add documentation (inside ClusterState) I personally don't see any places where devs can screw up. I'll be glad to address any shortcomings

          Show
          Noble Paul added a comment - My feeling is that it's currently implemented like one organization or entity owns the code Is this comment made after looking at the latest patch? The current system does not have too many changes from the old system. I know a couple a places I would need to add documentation (inside ClusterState) I personally don't see any places where devs can screw up. I'll be glad to address any shortcomings
          Hide
          Mark Miller added a comment -

          Yes, that comment comes after looking at the latest patch.

          Let's also finally nuke all references to "external", ie ExternalCollectionsTest

          Show
          Mark Miller added a comment - Yes, that comment comes after looking at the latest patch. Let's also finally nuke all references to "external", ie ExternalCollectionsTest
          Hide
          Mark Miller added a comment -

          I'll try and do a pass of my own on the patch later this week or early next week.

          Show
          Mark Miller added a comment - I'll try and do a pass of my own on the patch later this week or early next week.
          Hide
          Noble Paul added a comment -

          Mark Miller did you manage to try an alternate approach ?

          Show
          Noble Paul added a comment - Mark Miller did you manage to try an alternate approach ?
          Hide
          Mark Miller added a comment -

          Sorry, another week of craziness on my end. I'll dedicate my Saturday work to this and get a new patch up over the weekend. Unless I see something I really have a big issue with, I'll mainly concentrate on the documentation I think we should have for other devs and we can increment on most anything else after committing.

          Show
          Mark Miller added a comment - Sorry, another week of craziness on my end. I'll dedicate my Saturday work to this and get a new patch up over the weekend. Unless I see something I really have a big issue with, I'll mainly concentrate on the documentation I think we should have for other devs and we can increment on most anything else after committing.
          Hide
          Mark Miller added a comment -

          Here is a patch that focuses only on bringing the code up to standard project formatting guidelines.

          reviewboard: https://reviews.apache.org/r/24220/

          changes: https://reviews.apache.org/r/24220/diff/1-2/

          Show
          Mark Miller added a comment - Here is a patch that focuses only on bringing the code up to standard project formatting guidelines. reviewboard: https://reviews.apache.org/r/24220/ changes: https://reviews.apache.org/r/24220/diff/1-2/
          Hide
          Mark Miller added a comment -

          Another patch to follow and then some comments.

          Show
          Mark Miller added a comment - Another patch to follow and then some comments.
          Hide
          Mark Miller added a comment -

          Also, FYI, I think 'SOLR-5810 State of external collections not displayed in cloud graph panel' should be resolved as part of this issue if it's still applicable. I think that's a pretty critical part of this as we are not considering this an optional 'mode' but an architecture transition. Looks like some good progress has already been made on SOLR-5810, so hopefully it's just a matter of buttoning that up and rolling it into this.

          Show
          Mark Miller added a comment - Also, FYI, I think ' SOLR-5810 State of external collections not displayed in cloud graph panel' should be resolved as part of this issue if it's still applicable. I think that's a pretty critical part of this as we are not considering this an optional 'mode' but an architecture transition. Looks like some good progress has already been made on SOLR-5810 , so hopefully it's just a matter of buttoning that up and rolling it into this.
          Hide
          Mark Miller added a comment -

          I'm not sure I agree with the use of collections api param for controlling this anymore.

          Also:

          • I think if we are transitioning to this, on 5x at least, it should default to true. We should probably remove the old mode as well, but I'm all for waiting on that so we don't diverge 4x from 5x too early.
          • I don't really see the need to support a mixed install. It seems like you should either have the new mode or the deprecated mode. Exposing the option of the two in the collections api is leaking internals that made more sense when this was implemented as an optional collection feature. If an expert user with an existing install would like to turn this on for new collections, that's a consideration, but I think that should be an expert, perhaps undoc'd param, and this overall architecture change should be controlled at the top Solr config level, perhaps at the cluster config level?
          Show
          Mark Miller added a comment - I'm not sure I agree with the use of collections api param for controlling this anymore. Also: I think if we are transitioning to this, on 5x at least, it should default to true. We should probably remove the old mode as well, but I'm all for waiting on that so we don't diverge 4x from 5x too early. I don't really see the need to support a mixed install. It seems like you should either have the new mode or the deprecated mode. Exposing the option of the two in the collections api is leaking internals that made more sense when this was implemented as an optional collection feature. If an expert user with an existing install would like to turn this on for new collections, that's a consideration, but I think that should be an expert, perhaps undoc'd param, and this overall architecture change should be controlled at the top Solr config level, perhaps at the cluster config level?
          Hide
          Mark Miller added a comment -

          I'm in the middle of a patch, but putting it down tonight. I'll come back to it tomorrow night.

          Show
          Mark Miller added a comment - I'm in the middle of a patch, but putting it down tonight. I'll come back to it tomorrow night.
          Hide
          Noble Paul added a comment -

          I think if we are transitioning to this, on 5x at least, it should default to true.

          Yes, absolutely.

          I don't really see the need to support a mixed install ...

          If I already have a cluster with branch_4x and I wish to move to a newer build , don't we need to support both formats? The next question is If I have many collections already in my cluster how/when should the split happen

          Show
          Noble Paul added a comment - I think if we are transitioning to this, on 5x at least, it should default to true. Yes, absolutely. I don't really see the need to support a mixed install ... If I already have a cluster with branch_4x and I wish to move to a newer build , don't we need to support both formats? The next question is If I have many collections already in my cluster how/when should the split happen
          Hide
          Timothy Potter added a comment -

          Here's an updated patch based on the previous one posted by Mark on Aug. 4. The only difference is this patch removes the updates to the ZookeeperInfoServlet as all UI changes needed for this ticket are being handled in SOLR-5810 (and the patches were competing with each other because this one contained some older changes to ZookeeperInfoServlet). Please apply this patch to trunk and then the latest from SOLR-5810

          Show
          Timothy Potter added a comment - Here's an updated patch based on the previous one posted by Mark on Aug. 4. The only difference is this patch removes the updates to the ZookeeperInfoServlet as all UI changes needed for this ticket are being handled in SOLR-5810 (and the patches were competing with each other because this one contained some older changes to ZookeeperInfoServlet). Please apply this patch to trunk and then the latest from SOLR-5810
          Hide
          Mark Miller added a comment -

          I'll try and finish my new patch this weekend.

          I need to finish documenting and tie up the mixed install stuff.

          don't we need to support both formats?

          Yeah, we do, but I think the primary way we do it should not be as a stateFormat= param when creating a collection. I don't mind that as an expert, unsupported override or something, but by and large I think this should be a system wide config, similar to legacyMode. Users with old config would get the old style. If they want the new, they can change their config. Most new downloaders should be running all new style, exisiting users old style unless they change a sys config and switch. Or if they want to try the new on a per collection basis, they can use the expert collection create param, but we shouldn't encourage that.

          Show
          Mark Miller added a comment - I'll try and finish my new patch this weekend. I need to finish documenting and tie up the mixed install stuff. don't we need to support both formats? Yeah, we do, but I think the primary way we do it should not be as a stateFormat= param when creating a collection. I don't mind that as an expert, unsupported override or something, but by and large I think this should be a system wide config, similar to legacyMode. Users with old config would get the old style. If they want the new, they can change their config. Most new downloaders should be running all new style, exisiting users old style unless they change a sys config and switch. Or if they want to try the new on a per collection basis, they can use the expert collection create param, but we shouldn't encourage that.
          Hide
          Noble Paul added a comment - - edited

          I don't mind that as an expert, unsupported override or something, but by and large I think this should be a system wide config, similar to legacyMode

          +1

          Show
          Noble Paul added a comment - - edited I don't mind that as an expert, unsupported override or something, but by and large I think this should be a system wide config, similar to legacyMode +1
          Hide
          Mark Miller added a comment -

          I need to finish documenting and tie up the mixed install stuff.

          Start of a release is a good time to get this in. I'll try and wrap up this up very shortly.

          Show
          Mark Miller added a comment - I need to finish documenting and tie up the mixed install stuff. Start of a release is a good time to get this in. I'll try and wrap up this up very shortly.
          Hide
          Mark Miller added a comment -

          https://reviews.apache.org/r/24220/

          New patch. I have not had a chance to incorporate Tim's changes yet - had some stuff in progress when his combined patch went up.

          I think this should be a system wide config, similar to legacyMode

          I have not done this yet but I'm fine doing it after commit.

          There do seem to be some more common test fails with this - but we can likely work that out after commit as well.

          Show
          Mark Miller added a comment - https://reviews.apache.org/r/24220/ New patch. I have not had a chance to incorporate Tim's changes yet - had some stuff in progress when his combined patch went up. I think this should be a system wide config, similar to legacyMode I have not done this yet but I'm fine doing it after commit. There do seem to be some more common test fails with this - but we can likely work that out after commit as well.
          Hide
          Noble Paul added a comment -

          Tim's patch is a dependency . Without that, the system is broken

          Show
          Noble Paul added a comment - Tim's patch is a dependency . Without that, the system is broken
          Hide
          Mark Miller added a comment -

          And...?

          Show
          Mark Miller added a comment - And...?
          Hide
          Noble Paul added a comment -

          The patch didn't apply to the trunk

          Show
          Noble Paul added a comment - The patch didn't apply to the trunk
          Hide
          Mark Miller added a comment -

          I had a little trouble apply at first, but it applied with: patch -p0 < ~/Downloads/col-split-3.patch

          New patch with SOLR-5810 here: https://reviews.apache.org/r/24220/

          We still have to handle the config, and a test or two seem to fail a bit easier (mostly, syncslicetest). I think we can deal with that based on how Jenkins responds after commit though.

          Show
          Mark Miller added a comment - I had a little trouble apply at first, but it applied with: patch -p0 < ~/Downloads/col-split-3.patch New patch with SOLR-5810 here: https://reviews.apache.org/r/24220/ We still have to handle the config, and a test or two seem to fail a bit easier (mostly, syncslicetest). I think we can deal with that based on how Jenkins responds after commit though.
          Hide
          Mark Miller added a comment -

          Added a new patch that should pass precommit.

          Show
          Mark Miller added a comment - Added a new patch that should pass precommit.
          Hide
          Noble Paul added a comment -

          anything that stops us from committing this ?

          Show
          Noble Paul added a comment - anything that stops us from committing this ?
          Hide
          Noble Paul added a comment -

          patch updated to trunk, I plan to commit this soon

          Show
          Noble Paul added a comment - patch updated to trunk, I plan to commit this soon
          Hide
          Mark Miller added a comment -

          Fire away. I'll make a new issue for the config thing we talked about above.

          Show
          Mark Miller added a comment - Fire away. I'll make a new issue for the config thing we talked about above.
          Hide
          ASF subversion and git services added a comment -

          Commit 1624556 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1624556 ]

          split clusterstate.json SOLR-5473, SOLR-5474, SOLR-5810

          Show
          ASF subversion and git services added a comment - Commit 1624556 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1624556 ] split clusterstate.json SOLR-5473 , SOLR-5474 , SOLR-5810
          Hide
          ASF subversion and git services added a comment -

          Commit 1624557 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1624557 ]

          SOLR-5473 set eol property

          Show
          ASF subversion and git services added a comment - Commit 1624557 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1624557 ] SOLR-5473 set eol property
          Hide
          ASF subversion and git services added a comment -

          Commit 1624650 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1624650 ]

          SOLR-5473, SOLR-5474, SOLR-5810 ... NO SVN KEYWORDS! ! !

          Show
          ASF subversion and git services added a comment - Commit 1624650 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1624650 ] SOLR-5473 , SOLR-5474 , SOLR-5810 ... NO SVN KEYWORDS! ! !
          Hide
          Jessica Cheng Mallet added a comment -

          Hey guys, under heavy load-testing yesterday with this I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec.

          Show
          Jessica Cheng Mallet added a comment - Hey guys, under heavy load-testing yesterday with this I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec.
          Hide
          Steve Rowe added a comment -

          SyncSliceTest started failing after the r1624556 commit: see SOLR-6514.

          Show
          Steve Rowe added a comment - SyncSliceTest started failing after the r1624556 commit: see SOLR-6514 .
          Hide
          Noble Paul added a comment -

          Jessica Cheng Mallet Can you please open a ticket and we can fix it separately

          Show
          Noble Paul added a comment - Jessica Cheng Mallet Can you please open a ticket and we can fix it separately
          Hide
          ASF subversion and git services added a comment -

          Commit 1624770 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1624770 ]

          SOLR-5473 removed unused APi, logging

          Show
          ASF subversion and git services added a comment - Commit 1624770 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1624770 ] SOLR-5473 removed unused APi, logging
          Show
          Jessica Cheng Mallet added a comment - Noble Paul Here: https://issues.apache.org/jira/browse/SOLR-6521 . Thanks!
          Hide
          ASF subversion and git services added a comment -

          Commit 1626131 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1626131 ]

          SOLR-5473 use zk.clean() instead of zk.delete()

          Show
          ASF subversion and git services added a comment - Commit 1626131 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1626131 ] SOLR-5473 use zk.clean() instead of zk.delete()
          Hide
          ASF subversion and git services added a comment -

          Commit 1626132 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1626132 ]

          SOLR-5473 use zk.clean() instead of zk.delete()

          Show
          ASF subversion and git services added a comment - Commit 1626132 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1626132 ] SOLR-5473 use zk.clean() instead of zk.delete()
          Hide
          ASF subversion and git services added a comment -

          Commit 1640648 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1640648 ]

          SOLR-5473: Fixing the typo in CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1640648 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1640648 ] SOLR-5473 : Fixing the typo in CHANGES.txt
          Hide
          ASF subversion and git services added a comment -

          Commit 1640649 from Anshum Gupta in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1640649 ]

          SOLR-5473: Fixing the typo in CHANGES.txt (merge from trunk)

          Show
          ASF subversion and git services added a comment - Commit 1640649 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1640649 ] SOLR-5473 : Fixing the typo in CHANGES.txt (merge from trunk)
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Noble Paul
              Reporter:
              Noble Paul
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development