Solr
  1. Solr
  2. SOLR-6629

Watch /collections zk node on all nodes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      The main clusterstate.json is refreshed/used as a poor substitute for informing all nodes about new or deleted collections even when the collection being created or deleted has state format > 1. When we move away from state format 1 then we should do away with this workaround and start watching the /collections zk node on all nodes.

      1. SOLR-6629.patch
        24 kB
        Shalin Shekhar Mangar
      2. SOLR-6629.patch
        23 kB
        Scott Blum
      3. SOLR-6629.patch
        23 kB
        Scott Blum
      4. SOLR-6629.patch
        17 kB
        Shalin Shekhar Mangar
      5. SOLR-6629.patch
        17 kB
        Scott Blum
      6. SOLR-6629-new.patch
        22 kB
        Scott Blum

        Activity

        Hide
        Scott Blum added a comment -

        Might get some love as part of SOLR-5756

        Show
        Scott Blum added a comment - Might get some love as part of SOLR-5756
        Hide
        Scott Blum added a comment -

        Now that SOLR-5756 is landed, can we flesh this ticket out some more?
        Does this boil down to "watchChildren on /collections at all times, and actively manage the lazy-loaded collection state list"?

        Show
        Scott Blum added a comment - Now that SOLR-5756 is landed, can we flesh this ticket out some more? Does this boil down to "watchChildren on /collections at all times, and actively manage the lazy-loaded collection state list"?
        Hide
        Noble Paul added a comment -

        yes

        Show
        Noble Paul added a comment - yes
        Hide
        Scott Blum added a comment -

        Note: LegacyClusterStateWatcher.refreshAndWatch() is what currently triggers refreshLazyFormat2Collections(). In other words, refreshes on the lazy collection set are tied to changes in shared clusterstate.json. A better solution would be to do a child watch on /collections instead and completely decouple the lazy collection set from changes to clusterstate.json. This might also involve tracking down any overseer code that forces a now-unnecessary clusterstate.json mutation purely as a signal.

        Show
        Scott Blum added a comment - Note: LegacyClusterStateWatcher.refreshAndWatch() is what currently triggers refreshLazyFormat2Collections(). In other words, refreshes on the lazy collection set are tied to changes in shared clusterstate.json. A better solution would be to do a child watch on /collections instead and completely decouple the lazy collection set from changes to clusterstate.json. This might also involve tracking down any overseer code that forces a now-unnecessary clusterstate.json mutation purely as a signal.
        Hide
        Scott Blum added a comment -

        First pass. ZkStateWriterTest is failing, but I need guidance on whether the test expectations still make sense, or whether I'm not tracking the batching correctly now.

        Show
        Scott Blum added a comment - First pass. ZkStateWriterTest is failing, but I need guidance on whether the test expectations still make sense, or whether I'm not tracking the batching correctly now.
        Hide
        Scott Blum added a comment - - edited

        Second pass, I think this is actually in a good state (ALL TESTS PASSING). We do need to chat through the new handling of lazyCollections some more, but I thought I would present to you a working version in code of what I was talking about.

        We should also consider optimizing lazyCollectionState to do temporary caching (cache for 2 seconds?) to avoid hitting ZK rapidly in succession.

        Show
        Scott Blum added a comment - - edited Second pass, I think this is actually in a good state (ALL TESTS PASSING). We do need to chat through the new handling of lazyCollections some more, but I thought I would present to you a working version in code of what I was talking about. We should also consider optimizing lazyCollectionState to do temporary caching (cache for 2 seconds?) to avoid hitting ZK rapidly in succession.
        Hide
        Scott Blum added a comment -

        Added another patch, demonstrating the Lazy Collection caching.

        Show
        Scott Blum added a comment - Added another patch, demonstrating the Lazy Collection caching.
        Hide
        Scott Blum added a comment -

        Shalin Shekhar Mangar ping for review

        Show
        Scott Blum added a comment - Shalin Shekhar Mangar ping for review
        Hide
        Shalin Shekhar Mangar added a comment -

        Patch wasn't applying cleanly to trunk so I updated it. This one should apply cleanly.

        Thanks Scott. Overall looks great to me!

        A few comments:

        1. In ZkStateReader#refreshCollectionList, how do you ensure that a stateFormat=1 collection which is not interesting to us, isn't put into the lazyCollectionStates map? Or does it not matter because constructState() will always give priority to the ones in shared cluster state?
        2. nit - The variables in ZkStateWriter "wasPreviouslyState1" and "isCurrentlyState1" were confusing me so I renamed them to "wasPreviouslyStateFormat1" and "isCurrentlyStateFormat1"
        Show
        Shalin Shekhar Mangar added a comment - Patch wasn't applying cleanly to trunk so I updated it. This one should apply cleanly. Thanks Scott. Overall looks great to me! A few comments: In ZkStateReader#refreshCollectionList, how do you ensure that a stateFormat=1 collection which is not interesting to us, isn't put into the lazyCollectionStates map? Or does it not matter because constructState() will always give priority to the ones in shared cluster state? nit - The variables in ZkStateWriter "wasPreviouslyState1" and "isCurrentlyState1" were confusing me so I renamed them to "wasPreviouslyStateFormat1" and "isCurrentlyStateFormat1"
        Hide
        Shalin Shekhar Mangar added a comment -

        In ZkStateReader#refreshCollectionList, how do you ensure that a stateFormat=1 collection which is not interesting to us, isn't put into the lazyCollectionStates map? Or does it not matter because constructState() will always give priority to the ones in shared cluster state?

        Hmm, answering my own question, this indeed looks deliberate and seems like a clever way of avoiding the ZK exists call for state.json!

        Show
        Shalin Shekhar Mangar added a comment - In ZkStateReader#refreshCollectionList, how do you ensure that a stateFormat=1 collection which is not interesting to us, isn't put into the lazyCollectionStates map? Or does it not matter because constructState() will always give priority to the ones in shared cluster state? Hmm, answering my own question, this indeed looks deliberate and seems like a clever way of avoiding the ZK exists call for state.json!
        Hide
        Scott Blum added a comment -

        Should fix the test break.

        Show
        Scott Blum added a comment - Should fix the test break.
        Hide
        Scott Blum added a comment -

        Removed lazy collection caching behavior. All tests pass on trunk for me.

        Show
        Scott Blum added a comment - Removed lazy collection caching behavior. All tests pass on trunk for me.
        Hide
        Shalin Shekhar Mangar added a comment -

        Scott Blum - did you upload the right patch? I still see caching in LazyCollectionRef.

        Show
        Shalin Shekhar Mangar added a comment - Scott Blum - did you upload the right patch? I still see caching in LazyCollectionRef.
        Hide
        Scott Blum added a comment -

        Nope, I screwed it up, sorry! Correct patch attached.

        Show
        Scott Blum added a comment - Nope, I screwed it up, sorry! Correct patch attached.
        Hide
        Shalin Shekhar Mangar added a comment -

        I added javadocs and comments in a few places. This is ready. I'll commit shortly.

        Show
        Shalin Shekhar Mangar added a comment - I added javadocs and comments in a few places. This is ready. I'll commit shortly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1697562 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1697562 ]

        SOLR-6629: Watch /collections zk node on all nodes

        Show
        ASF subversion and git services added a comment - Commit 1697562 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1697562 ] SOLR-6629 : Watch /collections zk node on all nodes
        Hide
        ASF subversion and git services added a comment -

        Commit 1697670 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1697670 ]

        SOLR-6629: Watch /collections zk node on all nodes

        Show
        ASF subversion and git services added a comment - Commit 1697670 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1697670 ] SOLR-6629 : Watch /collections zk node on all nodes
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Scott!

        Show
        Shalin Shekhar Mangar added a comment - Thanks Scott!
        Hide
        Scott Blum added a comment -

        Yay!

        Show
        Scott Blum added a comment - Yay!

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development