Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8973

TX-frenzy on Zookeeper when collection is put to use

    Details

    • Flags:
      Patch

      Description

      This is to do with a distributed data-race. Core-creation happens at a time when collection is not yet visible to the node. In this case a fallback code-path is used which de-references collection-state lazily (on demand) as opposed to setting a watch and keeping it cached locally.

      Due to this, as requests towards the core mount, it generates ZK fetch for collection proportionately. On a large solr-cloud cluster, this generates several Gbps of TX traffic on ZK nodes. This affects indexing throughput(which floors) in addition to running ZK node out of network bandwidth.

      On smaller solr-cloud clusters its hard to run into, because probability of this race materializing reduces.

      1. SOLR-8973.patch
        7 kB
        Shalin Shekhar Mangar
      2. SOLR-8973.patch
        2 kB
        Janmejay Singh
      3. SOLR-8973.patch
        3 kB
        Janmejay Singh
      4. SOLR-8973-ZkStateReader.patch
        6 kB
        Scott Blum

        Activity

        Hide
        janmejay Janmejay Singh added a comment -

        Patch allows a node to wait upto 1 minute for collection to become visible. The wait-time can perhaps be made configurable, but this is sufficient (because tick time on zk is usually 2 seconds and delay is never more than 2 ticks).

        Show
        janmejay Janmejay Singh added a comment - Patch allows a node to wait upto 1 minute for collection to become visible. The wait-time can perhaps be made configurable, but this is sufficient (because tick time on zk is usually 2 seconds and delay is never more than 2 ticks).
        Hide
        erickerickson Erick Erickson added a comment -
        Show
        erickerickson Erick Erickson added a comment - Is this related to https://issues.apache.org/jira/browse/SOLR-8416?
        Hide
        janmejay Janmejay Singh added a comment -

        No, SOLR-8416 deals with collections handler waiting for cores to be active, this deals with core-create handler waiting for collection definition to be known/visible (as published by collection-create operation).

        Show
        janmejay Janmejay Singh added a comment - No, SOLR-8416 deals with collections handler waiting for cores to be active, this deals with core-create handler waiting for collection definition to be known/visible (as published by collection-create operation).
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Thanks for the bug report, Janmejay Singh! There's definitely a race here but instead of waiting for the collection to be visible (which has its own problem that I'll describe later), we can simply call `ZkStateReader.addCollectionWatch` anyway (without checking if the collection exists already) which will force ZkStateReader to fetch the collection state from Zk and cache it.

        The problem with waiting for the collection as done in this patch is that it is allowed for collections to be created using the core admin API directly i.e. the collection is created in ZK by the Overseer when the core publishes its state. So in such cases, you will see spurious waits.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Thanks for the bug report, Janmejay Singh ! There's definitely a race here but instead of waiting for the collection to be visible (which has its own problem that I'll describe later), we can simply call `ZkStateReader.addCollectionWatch` anyway (without checking if the collection exists already) which will force ZkStateReader to fetch the collection state from Zk and cache it. The problem with waiting for the collection as done in this patch is that it is allowed for collections to be created using the core admin API directly i.e. the collection is created in ZK by the Overseer when the core publishes its state. So in such cases, you will see spurious waits.
        Hide
        janmejay Janmejay Singh added a comment -

        Won't the watch-setup fail when collection is not available on ZK?

        Show
        janmejay Janmejay Singh added a comment - Won't the watch-setup fail when collection is not available on ZK?
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Won't the watch-setup fail when collection is not available on ZK?

        This is not possible because when a collection create API is invoked, the Overseer message handler will first create the collection in ZK, wait upto 30 seconds to see the collection in ZK and only then proceeds to create the cores. So the collection always exists in ZK when the core create is called and in such a case the watch-setup will always succeed.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Won't the watch-setup fail when collection is not available on ZK? This is not possible because when a collection create API is invoked, the Overseer message handler will first create the collection in ZK, wait upto 30 seconds to see the collection in ZK and only then proceeds to create the cores. So the collection always exists in ZK when the core create is called and in such a case the watch-setup will always succeed.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        You may also be interested in SOLR-8327 which is a separate bug but has similar symptoms.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - You may also be interested in SOLR-8327 which is a separate bug but has similar symptoms.
        Hide
        janmejay Janmejay Singh added a comment -

        No, there is a difference in what overseer and core-api (on a different node) see at the same instant. Some ZK nodes may be lagging (ZK does not ensure visibility of changes across all nodes at the same time), when clients can't tolerate delay in visibility of changes, they need to execute sync operation before read.

        Overseer's session may be connected to a zk-node that is ahead of the zk-node that the core-node is connected to. So while overseer sees the change, core-node will not (unless it executes sync before read).

        If all nodes saw the same version as overseer, the race wouldn't exist at all.

        We can change the patch to lazily setup watch for a collection that is fetched using active(on-demand) fetcher. In this model, once the fetch is done successfully, it will setup watch for the collection before returning the fetched collection-def.

        Show
        janmejay Janmejay Singh added a comment - No, there is a difference in what overseer and core-api (on a different node) see at the same instant. Some ZK nodes may be lagging (ZK does not ensure visibility of changes across all nodes at the same time), when clients can't tolerate delay in visibility of changes, they need to execute sync operation before read. Overseer's session may be connected to a zk-node that is ahead of the zk-node that the core-node is connected to. So while overseer sees the change, core-node will not (unless it executes sync before read). If all nodes saw the same version as overseer, the race wouldn't exist at all. We can change the patch to lazily setup watch for a collection that is fetched using active(on-demand) fetcher. In this model, once the fetch is done successfully, it will setup watch for the collection before returning the fetched collection-def.
        Hide
        janmejay Janmejay Singh added a comment -

        SOLR-8327 seems to propose the same thing except in the context of solrj client. I was suggesting the same approach as an viable alternative to wait (in this case we are not worried about #watches, because #watches would anyway have been the same had the delay not been present.

        Show
        janmejay Janmejay Singh added a comment - SOLR-8327 seems to propose the same thing except in the context of solrj client. I was suggesting the same approach as an viable alternative to wait (in this case we are not worried about #watches, because #watches would anyway have been the same had the delay not been present.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Ah, you are right that the watcher won't be set if the new collection's znode is not visible.

        But the solution that I proposed will still work – if you see ZkStateReader.constructState(), you'll see that it tries to re-create the watcher every time a collection is in the "interestingCollections" set but neither in legacyCollectionStates nor in watchedCollectionStates. Since the constructState method is called each and every time any watched znode changes, the new collection's state will eventually be cached. I don't think that any alternate solution will achieve a better result. This may be a side-effect of the way the state is managed but it will work. We can document this as a code comment as well. The bug here is that we do not call ZkStateReader.addCollectionWatch at all if the collection is not visible to the node yet.

        Scott Blum Since you recently wrote most of this state management code, what do you think?

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Ah, you are right that the watcher won't be set if the new collection's znode is not visible. But the solution that I proposed will still work – if you see ZkStateReader.constructState(), you'll see that it tries to re-create the watcher every time a collection is in the "interestingCollections" set but neither in legacyCollectionStates nor in watchedCollectionStates. Since the constructState method is called each and every time any watched znode changes, the new collection's state will eventually be cached. I don't think that any alternate solution will achieve a better result. This may be a side-effect of the way the state is managed but it will work. We can document this as a code comment as well. The bug here is that we do not call ZkStateReader.addCollectionWatch at all if the collection is not visible to the node yet. Scott Blum Since you recently wrote most of this state management code, what do you think?
        Hide
        janmejay Janmejay Singh added a comment -

        Cool, allow me some time to go trough the interestingCollections and watch-setup code. I'll upload a new patch tomorrow (assuming it won't throw any surprises our way).

        Show
        janmejay Janmejay Singh added a comment - Cool, allow me some time to go trough the interestingCollections and watch-setup code. I'll upload a new patch tomorrow (assuming it won't throw any surprises our way).
        Hide
        dragonsinth Scott Blum added a comment -

        I'll need to look at the code again and run it to verify, but I'm 99% sure adding a collection watch to a non-existent collection will cause it to get mapped as soon as the collection's state node appears. That was for sure my design intent, so if it doesn't work that way, it's a bug and I'll fix it ASAP. I'll check it out and get back.

        Show
        dragonsinth Scott Blum added a comment - I'll need to look at the code again and run it to verify, but I'm 99% sure adding a collection watch to a non-existent collection will cause it to get mapped as soon as the collection's state node appears. That was for sure my design intent, so if it doesn't work that way, it's a bug and I'll fix it ASAP. I'll check it out and get back.
        Hide
        dragonsinth Scott Blum added a comment - - edited

        Shalin Shekhar Mangar I've come to the conclusion that ZkStateReader isn't doing as well as it could be. Adding watchers in constructState() seems (retroactively) like a hack. It doesn't correctly cover the case where a collection parent node exists (e.g. /solr/collections/coll1) but no state.json child yet appears.

        I believe I have a patch and test to fix this. Attached it to this JIRA, but not sure if I should create a new one.

        Show
        dragonsinth Scott Blum added a comment - - edited Shalin Shekhar Mangar I've come to the conclusion that ZkStateReader isn't doing as well as it could be. Adding watchers in constructState() seems (retroactively) like a hack. It doesn't correctly cover the case where a collection parent node exists (e.g. /solr/collections/coll1) but no state.json child yet appears. I believe I have a patch and test to fix this. Attached it to this JIRA, but not sure if I should create a new one.
        Hide
        dragonsinth Scott Blum added a comment -

        BTW: this may also fix a "bug" where we could queue more watchers than we need on a single node.

        Show
        dragonsinth Scott Blum added a comment - BTW: this may also fix a "bug" where we could queue more watchers than we need on a single node.
        Hide
        janmejay Janmejay Singh added a comment -

        What you suggested should work in either case (where core-admin RPC request arrives first, or when /collections path watch triggers first).

        Show
        janmejay Janmejay Singh added a comment - What you suggested should work in either case (where core-admin RPC request arrives first, or when /collections path watch triggers first).
        Hide
        janmejay Janmejay Singh added a comment -

        I created this patch without applying Scott's patch. Let me know if rebase is required.

        Show
        janmejay Janmejay Singh added a comment - I created this patch without applying Scott's patch. Let me know if rebase is required.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        I really like Scott's solution. It explicitly handles the race condition in a clear way. But it is not complete i.e. we still need to modify ZkController to add a watch for the collection regardless of whether it is visible or not. So this patch merges both Scott's and Janmejay's patches.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - I really like Scott's solution. It explicitly handles the race condition in a clear way. But it is not complete i.e. we still need to modify ZkController to add a watch for the collection regardless of whether it is visible or not. So this patch merges both Scott's and Janmejay's patches.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        This is all yours Scott. Commit away!

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - This is all yours Scott. Commit away!
        Hide
        erickerickson Erick Erickson added a comment -

        Scott:

        As it happens, I may be interested in this patch Real Soon Now and have access to a dev environment that would stress this a lot. How close do you think this is to being committable? Or, more precisely, testable? The "dev environment" is expensive to run in this case, so when there's a patch you're satisfied with the patch I might be able to give a it a spin; it doesn't have to be committed. Dev environment is 5x, do you think that matters?

        Thanks!
        Erick

        Show
        erickerickson Erick Erickson added a comment - Scott: As it happens, I may be interested in this patch Real Soon Now and have access to a dev environment that would stress this a lot . How close do you think this is to being committable? Or, more precisely, testable? The "dev environment" is expensive to run in this case, so when there's a patch you're satisfied with the patch I might be able to give a it a spin; it doesn't have to be committed. Dev environment is 5x, do you think that matters? Thanks! Erick
        Hide
        dragonsinth Scott Blum added a comment -

        Anshum Gupta Landing this on master/6x now. Another one to consider for 5.5.1 backport (and I'll volunteer).

        Show
        dragonsinth Scott Blum added a comment - Anshum Gupta Landing this on master/6x now. Another one to consider for 5.5.1 backport (and I'll volunteer).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 71a5870ae7022d7a30ebd1147de36c44aa14dc63 in lucene-solr's branch refs/heads/master from Scott Blum
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71a5870 ]

        SOLR-8973: Zookeeper frenzy when a core is first created.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 71a5870ae7022d7a30ebd1147de36c44aa14dc63 in lucene-solr's branch refs/heads/master from Scott Blum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71a5870 ] SOLR-8973 : Zookeeper frenzy when a core is first created.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 7c356bad06418d95c5394a7d7d5bf5e54cbf39bb in lucene-solr's branch refs/heads/branch_6x from Scott Blum
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c356ba ]

        SOLR-8973: Zookeeper frenzy when a core is first created.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 7c356bad06418d95c5394a7d7d5bf5e54cbf39bb in lucene-solr's branch refs/heads/branch_6x from Scott Blum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c356ba ] SOLR-8973 : Zookeeper frenzy when a core is first created.
        Hide
        anshumg Anshum Gupta added a comment -

        +1 Scott!

        Show
        anshumg Anshum Gupta added a comment - +1 Scott!
        Hide
        erickerickson Erick Erickson added a comment -

        +1 for 5.5.1

        Show
        erickerickson Erick Erickson added a comment - +1 for 5.5.1
        Hide
        erickerickson Erick Erickson added a comment -

        Scott Blum So I just applied this as-is to 5.5 for my own exploration and it applied cleanly. Would that be what you expect?

        Thanks,
        Erick

        Show
        erickerickson Erick Erickson added a comment - Scott Blum So I just applied this as-is to 5.5 for my own exploration and it applied cleanly. Would that be what you expect? Thanks, Erick
        Hide
        dragonsinth Scott Blum added a comment -

        Yeah it wouldn't surprise me if it applies clean.

        Show
        dragonsinth Scott Blum added a comment - Yeah it wouldn't surprise me if it applies clean.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 9c7a031f988a0fa3a15d6fc23f7269d0c12fae16 in lucene-solr's branch refs/heads/branch_5x from Scott Blum
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c7a031 ]

        SOLR-8973: Zookeeper frenzy when a core is first created.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 9c7a031f988a0fa3a15d6fc23f7269d0c12fae16 in lucene-solr's branch refs/heads/branch_5x from Scott Blum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c7a031 ] SOLR-8973 : Zookeeper frenzy when a core is first created.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 802ee6f8c90ba6462f29c963a760aa1dfeeb09b1 in lucene-solr's branch refs/heads/branch_5_5 from Scott Blum
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=802ee6f ]

        SOLR-8973: Zookeeper frenzy when a core is first created.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 802ee6f8c90ba6462f29c963a760aa1dfeeb09b1 in lucene-solr's branch refs/heads/branch_5_5 from Scott Blum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=802ee6f ] SOLR-8973 : Zookeeper frenzy when a core is first created.
        Hide
        arafalov Alexandre Rafalovitch added a comment -

        Is SOLR-9001 (overseer exception in the OOTB cloud example) result of the same issue? Or is it something different?

        Show
        arafalov Alexandre Rafalovitch added a comment - Is SOLR-9001 (overseer exception in the OOTB cloud example) result of the same issue? Or is it something different?
        Hide
        hossman Hoss Man added a comment -

        Manually correcting fixVersion per Step #S5 of LUCENE-7271

        Show
        hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271
        Hide
        steve_rowe Steve Rowe added a comment -

        Reopening to backport to 6.0.1.

        Show
        steve_rowe Steve Rowe added a comment - Reopening to backport to 6.0.1.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b663e5bcad9974f2d80c16b85c862407a38290e0 in lucene-solr's branch refs/heads/branch_6_0 from Scott Blum
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b663e5b ]

        SOLR-8973: Zookeeper frenzy when a core is first created.

        For branch_6_0: Modified ZkStateReaderTest to use ZkStateReader.updateClusterState() instead of .forceUpdateCollection(), since SOLR-8745 will land in 6.1.

        Show
        jira-bot ASF subversion and git services added a comment - Commit b663e5bcad9974f2d80c16b85c862407a38290e0 in lucene-solr's branch refs/heads/branch_6_0 from Scott Blum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b663e5b ] SOLR-8973 : Zookeeper frenzy when a core is first created. For branch_6_0: Modified ZkStateReaderTest to use ZkStateReader.updateClusterState() instead of .forceUpdateCollection(), since SOLR-8745 will land in 6.1.
        Hide
        steve_rowe Steve Rowe added a comment -

        Bulk close issues included in the 6.0.1 release.

        Show
        steve_rowe Steve Rowe added a comment - Bulk close issues included in the 6.0.1 release.

          People

          • Assignee:
            dragonsinth Scott Blum
            Reporter:
            janmejay Janmejay Singh
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development