Solr
  1. Solr
  2. SOLR-1724

Real Basic Core Management with Zookeeper

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4
    • Fix Version/s: 4.9, 5.0
    • Component/s: multicore
    • Labels:
      None

      Description

      Though we're implementing cloud, I need something real soon I can
      play with and deploy. So this'll be a patch that only deploys
      new cores, and that's about it. The arch is real simple:

      On Zookeeper there'll be a directory that contains files that
      represent the state of the cores of a given set of servers which
      will look like the following:

      /production/cores-1.txt
      /production/cores-2.txt
      /production/core-host-1-actual.txt (ephemeral node per host)

      Where each core-N.txt file contains:

      hostname,corename,instanceDir,coredownloadpath

      coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc

      and

      core-host-actual.txt contains:

      hostname,corename,instanceDir,size

      Everytime a new core-N.txt file is added, the listening host
      finds it's entry in the list and begins the process of trying to
      match the entries. Upon completion, it updates it's
      /core-host-1-actual.txt file to it's completed state or logs an error.

      When all host actual files are written (without errors), then a
      new core-1-actual.txt file is written which can be picked up by
      another process that can create a new core proxy.

      1. SOLR-1724.patch
        114 kB
        Jason Rutherglen
      2. SOLR-1724.patch
        85 kB
        Jason Rutherglen
      3. SOLR-1724.patch
        83 kB
        Jason Rutherglen
      4. SOLR-1724.patch
        99 kB
        Jason Rutherglen
      5. SOLR-1724.patch
        64 kB
        Jason Rutherglen
      6. SOLR-1724.patch
        61 kB
        Jason Rutherglen
      7. SOLR-1724.patch
        58 kB
        Jason Rutherglen
      8. SOLR-1724.patch
        58 kB
        Jason Rutherglen
      9. SOLR-1724.patch
        54 kB
        Jason Rutherglen
      10. SOLR-1724.patch
        49 kB
        Jason Rutherglen
      11. gson-1.4.jar
        162 kB
        Jason Rutherglen
      12. hadoop-0.20.2-dev-test.jar
        1.49 MB
        Jason Rutherglen
      13. hadoop-0.20.2-dev-core.jar
        2.56 MB
        Jason Rutherglen
      14. commons-lang-2.4.jar
        256 kB
        Jason Rutherglen
      15. SOLR-1724.patch
        24 kB
        Jason Rutherglen

        Activity

        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Hoss Man added a comment -

        Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

        email notification suppressed to prevent mass-spam
        psuedo-unique token identifying these issues: hoss20120321nofix36

        Show
        Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Jason Rutherglen added a comment -

        Fixed the unit tests that were failing due to the switch over to using CoreContainer's initZooKeeper method. ZkNodeCoresManager is instantiated in CoreContainer.

        There's a beginning of a UI in zkcores.jsp

        I think we still need a core move test. I'm thinking of adding backing up a core as an action that may be performed in a new cores version file.

        Show
        Jason Rutherglen added a comment - Fixed the unit tests that were failing due to the switch over to using CoreContainer's initZooKeeper method. ZkNodeCoresManager is instantiated in CoreContainer. There's a beginning of a UI in zkcores.jsp I think we still need a core move test. I'm thinking of adding backing up a core as an action that may be performed in a new cores version file.
        Hide
        Jason Rutherglen added a comment -

        I'm starting work on the cores file upload. The cores file is in JSON format, and can be assembled by an entirely different process (i.e. the core assignment creation is decoupled from core deployment).

        I need to figure out how Solr HTML HTTP file uploading works... There's probably an example somewhere.

        Show
        Jason Rutherglen added a comment - I'm starting work on the cores file upload. The cores file is in JSON format, and can be assembled by an entirely different process (i.e. the core assignment creation is decoupled from core deployment). I need to figure out how Solr HTML HTTP file uploading works... There's probably an example somewhere.
        Hide
        Jason Rutherglen added a comment -

        Started on the nodes reporting their status to separate files that are ephemeral nodes, there's no sense in keeping them around if the node isn't up, and the status is legitimately ephemeral. In this case, the status will be something like "Core download 45% (7 GB of 15GB)".

        Show
        Jason Rutherglen added a comment - Started on the nodes reporting their status to separate files that are ephemeral nodes, there's no sense in keeping them around if the node isn't up, and the status is legitimately ephemeral. In this case, the status will be something like "Core download 45% (7 GB of 15GB)".
        Hide
        Lance Norskog added a comment -

        Are you using any of these?

        An Eclipse plug-in:
        http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

        A Django (Python web toolkit) app:
        http://github.com/phunt/zookeeper_dashboard

        A Swing UI
        http://issues.apache.org/jira/browse/ZOOKEEPER-418

        All seem to have recent activity. Maybe one of them could become a custom monitor.

        If you want to monitor a horde of machines & apps via JMX, Hyperic might be the right tool:
        http://support.hyperic.com/display/DOC/JMX+Plugin
        http://support.hyperic.com/display/DOC/JMX+Plugin+Tutorial

        When I tried Hyperic out a couple of years ago I was really impressed.

        Show
        Lance Norskog added a comment - Are you using any of these? An Eclipse plug-in: http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper A Django (Python web toolkit) app: http://github.com/phunt/zookeeper_dashboard A Swing UI http://issues.apache.org/jira/browse/ZOOKEEPER-418 All seem to have recent activity. Maybe one of them could become a custom monitor. If you want to monitor a horde of machines & apps via JMX, Hyperic might be the right tool: http://support.hyperic.com/display/DOC/JMX+Plugin http://support.hyperic.com/display/DOC/JMX+Plugin+Tutorial When I tried Hyperic out a couple of years ago I was really impressed.
        Hide
        Jason Rutherglen added a comment -

        In thinking about this some more, in order for the functionality
        provided in this issue to be more useful, there could be a web
        based UI to easily view the master cores table. There can
        additionally be an easy way to upload the new cores version into
        Zookeeper. I'm not sure if the uploading should be web based or
        command line, I'm figuring web based, simply because this is
        more in line with the rest of Solr.

        As a core is installed or is in the midst of some other process
        (such as backing itself up), the node/NodeCoresManager can
        report the ongoing status to Zookeeper. For large cores (i.e. 20
        GB) it's important to see how they're doing, and if they're
        taking too long, begin some remedial action. The UI can display
        the statuses.

        Show
        Jason Rutherglen added a comment - In thinking about this some more, in order for the functionality provided in this issue to be more useful, there could be a web based UI to easily view the master cores table. There can additionally be an easy way to upload the new cores version into Zookeeper. I'm not sure if the uploading should be web based or command line, I'm figuring web based, simply because this is more in line with the rest of Solr. As a core is installed or is in the midst of some other process (such as backing itself up), the node/NodeCoresManager can report the ongoing status to Zookeeper. For large cores (i.e. 20 GB) it's important to see how they're doing, and if they're taking too long, begin some remedial action. The UI can display the statuses.
        Hide
        Jason Rutherglen added a comment -

        Backing a core up works, at least according to the test case... I will probably begin to test this patch in a staging environment next, where Zookeeper is run in it's own process and a real HDFS cluster is used.

        Show
        Jason Rutherglen added a comment - Backing a core up works, at least according to the test case... I will probably begin to test this patch in a staging environment next, where Zookeeper is run in it's own process and a real HDFS cluster is used.
        Hide
        Jason Rutherglen added a comment -

        Zipping from a Lucene directory works and has a test case

        A ReplicationHandler is added by default under a unique name, if one exists already, we still create our own, for the express purpose of locking an index commit point, zipping it, then uploading it to, for example, HDFS. This part will likely be written next.

        Show
        Jason Rutherglen added a comment - Zipping from a Lucene directory works and has a test case A ReplicationHandler is added by default under a unique name, if one exists already, we still create our own, for the express purpose of locking an index commit point, zipping it, then uploading it to, for example, HDFS. This part will likely be written next.
        Hide
        Jason Rutherglen added a comment -

        I'm not sure how we'll handle (or if we even need to) installing
        a new core over an existing core of the same name, in other
        words core replacement. I think the instanceDir would need to be
        different, which means we'll need to detect and fail on the case
        of a new cores version (aka desired state) trying to install
        itself into an existing core's instanceDir. Otherwise this
        potential error case is costly in production.

        It makes me wonder about the shard id in Solr Cloud and how that
        can be used to uniquely identify an installed core, if a core of
        a given name is not guaranteed to be the same across Solr
        servers.

        Show
        Jason Rutherglen added a comment - I'm not sure how we'll handle (or if we even need to) installing a new core over an existing core of the same name, in other words core replacement. I think the instanceDir would need to be different, which means we'll need to detect and fail on the case of a new cores version (aka desired state) trying to install itself into an existing core's instanceDir. Otherwise this potential error case is costly in production. It makes me wonder about the shard id in Solr Cloud and how that can be used to uniquely identify an installed core, if a core of a given name is not guaranteed to be the same across Solr servers.
        Hide
        Jason Rutherglen added a comment -

        I added a test case that simulates attempting to install a bad core.

        Still need to get the backup a Solr core to HDFS working.

        Show
        Jason Rutherglen added a comment - I added a test case that simulates attempting to install a bad core. Still need to get the backup a Solr core to HDFS working.
        Hide
        Jason Rutherglen added a comment -

        We need a test case with a partial install, and cleaning up any extraneous files afterwards

        Show
        Jason Rutherglen added a comment - We need a test case with a partial install, and cleaning up any extraneous files afterwards
        Hide
        Jason Rutherglen added a comment -

        Actually, I just realized the whole exercise of moving a core is pointless, it's exactly the same as replication, so this is a non-issue...

        I'm going to work on backing up a core to HDFS...

        Show
        Jason Rutherglen added a comment - Actually, I just realized the whole exercise of moving a core is pointless, it's exactly the same as replication, so this is a non-issue... I'm going to work on backing up a core to HDFS...
        Hide
        Jason Rutherglen added a comment -

        I'm taking the approach of simply reusing SnapPuller and a replication handler for each core... This'll be faster to implement and more reliable for the first release (ie I won't run into little wacky bugs because I'll be reusing code that's well tested).

        Show
        Jason Rutherglen added a comment - I'm taking the approach of simply reusing SnapPuller and a replication handler for each core... This'll be faster to implement and more reliable for the first release (ie I won't run into little wacky bugs because I'll be reusing code that's well tested).
        Hide
        Jason Rutherglen added a comment -

        We need a URL type parameter to define if a URL in a core info is to a zip file or to a Solr server download point.

        Show
        Jason Rutherglen added a comment - We need a URL type parameter to define if a URL in a core info is to a zip file or to a Solr server download point.
        Hide
        Jason Rutherglen added a comment -

        Some further notes... I can reuse the replication code, but am going to place the functionality into core admin handler because it needs to work across cores and not have to be configured in each core's solrconfig.

        Also, we need to somehow support merging cores... Is that available yet? Looks like merge indexes is only for directories?

        Show
        Jason Rutherglen added a comment - Some further notes... I can reuse the replication code, but am going to place the functionality into core admin handler because it needs to work across cores and not have to be configured in each core's solrconfig. Also, we need to somehow support merging cores... Is that available yet? Looks like merge indexes is only for directories?
        Hide
        Jason Rutherglen added a comment -

        I think the check on whether a conf file's been modified, to reload the core, can borrow from the replication handler and check the diff based on the checksum of the files... Though this somewhat complicates the storage of the checksum and the resultant JSON file.

        Show
        Jason Rutherglen added a comment - I think the check on whether a conf file's been modified, to reload the core, can borrow from the replication handler and check the diff based on the checksum of the files... Though this somewhat complicates the storage of the checksum and the resultant JSON file.
        Hide
        Jason Rutherglen added a comment -

        Will this http access also allow a cluster with
        incrementally updated cores to replicate a core after a node
        failure?

        You're talking about moving an existing core into HDFS? That's a
        great idea... I'll add it to the list!

        Maybe for general "actions" to the system, there can be a ZK
        directory acting as a queue that contains actions to be
        performed by the cluster. When the action is completed it's
        corresponding action file is deleted.

        Show
        Jason Rutherglen added a comment - Will this http access also allow a cluster with incrementally updated cores to replicate a core after a node failure? You're talking about moving an existing core into HDFS? That's a great idea... I'll add it to the list! Maybe for general "actions" to the system, there can be a ZK directory acting as a queue that contains actions to be performed by the cluster. When the action is completed it's corresponding action file is deleted.
        Hide
        Jason Rutherglen added a comment -

        For the above core moving, utilizing the existing Java replication will probably be suitable. However, in all cases we need to copy the contents of all files related to the core (meaning everything under conf and data). How does one accomplish this?

        Show
        Jason Rutherglen added a comment - For the above core moving, utilizing the existing Java replication will probably be suitable. However, in all cases we need to copy the contents of all files related to the core (meaning everything under conf and data). How does one accomplish this?
        Hide
        Ted Dunning added a comment -


        Will this http access also allow a cluster with incrementally updated cores to replicate a core after a node failure?

        Show
        Ted Dunning added a comment - Will this http access also allow a cluster with incrementally updated cores to replicate a core after a node failure?
        Hide
        Jason Rutherglen added a comment -

        Also needed is the ability to move an existing core to a
        different Solr server. The core will need to be copied via
        direct HTTP file access, from a Solr server to another Solr
        server. There is no need to zip the core first.

        This feature is useful for core indexes that have been
        incrementally built, then need to be archived (i.e. the index was not
        constructed using Hadoop).

        Show
        Jason Rutherglen added a comment - Also needed is the ability to move an existing core to a different Solr server. The core will need to be copied via direct HTTP file access, from a Solr server to another Solr server. There is no need to zip the core first. This feature is useful for core indexes that have been incrementally built, then need to be archived (i.e. the index was not constructed using Hadoop).
        Hide
        Jason Rutherglen added a comment - - edited

        Removing cores seems to work well, on to modified cores... I'm checkpointing progress in case things break, I can easily roll back.

        Show
        Jason Rutherglen added a comment - - edited Removing cores seems to work well, on to modified cores... I'm checkpointing progress in case things break, I can easily roll back.
        Hide
        Jason Rutherglen added a comment -

        We need a test case for deleted and modified cores.

        Show
        Jason Rutherglen added a comment - We need a test case for deleted and modified cores.
        Hide
        Jason Rutherglen added a comment -

        Added a way to hold a given number of host or cores files around in ZK, after which, the oldest are deleted.

        Show
        Jason Rutherglen added a comment - Added a way to hold a given number of host or cores files around in ZK, after which, the oldest are deleted.
        Hide
        Jason Rutherglen added a comment -

        I need to add the deletion policy before I can test this in a real environment, otherwise bunches of useless files will pile up in ZK.

        Show
        Jason Rutherglen added a comment - I need to add the deletion policy before I can test this in a real environment, otherwise bunches of useless files will pile up in ZK.
        Hide
        Jason Rutherglen added a comment -

        Updated to HEAD

        Show
        Jason Rutherglen added a comment - Updated to HEAD
        Hide
        Jason Rutherglen added a comment -

        I need to figure out how integrate this with the Solr Cloud distributed search stuff... Hmm... Maybe I'll start with the Solr Cloud test cases?

        Show
        Jason Rutherglen added a comment - I need to figure out how integrate this with the Solr Cloud distributed search stuff... Hmm... Maybe I'll start with the Solr Cloud test cases?
        Hide
        Jason Rutherglen added a comment -
        • No-commit
        • NodeCoresManagerTest.testInstallCores works
        • There's HDFS test cases using MiniDFSCluster
        Show
        Jason Rutherglen added a comment - No-commit NodeCoresManagerTest.testInstallCores works There's HDFS test cases using MiniDFSCluster
        Hide
        Jason Rutherglen added a comment -

        No-commit

        NodeCoresManager[Test] needs more work

        A CoreController matchHosts unit test was added to CoreControllerTest

        Show
        Jason Rutherglen added a comment - No-commit NodeCoresManager [Test] needs more work A CoreController matchHosts unit test was added to CoreControllerTest
        Hide
        Jason Rutherglen added a comment -

        There's a wiki for this issue where the general specification is defined:

        http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper

        Show
        Jason Rutherglen added a comment - There's a wiki for this issue where the general specification is defined: http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper
        Hide
        Jason Rutherglen added a comment -

        Here's an update, we're onto the actual Solr node portion of the code, and some tests around that. I'm focusing on downloading cores out of HDFS because that's my use case.

        Show
        Jason Rutherglen added a comment - Here's an update, we're onto the actual Solr node portion of the code, and some tests around that. I'm focusing on downloading cores out of HDFS because that's my use case.
        Hide
        Jason Rutherglen added a comment -

        Hadoop and Gson dependencies

        Show
        Jason Rutherglen added a comment - Hadoop and Gson dependencies
        Hide
        Jason Rutherglen added a comment -

        For some reason ZkTestServer doesn't need to be shutdown any longer?

        Show
        Jason Rutherglen added a comment - For some reason ZkTestServer doesn't need to be shutdown any longer?
        Hide
        Mark Miller added a comment -

        The ZK port changed in ZkTestServe

        Yeah - too easy to bump against a local ZooKeeper server with the default port, so I've switched it up.

        Show
        Mark Miller added a comment - The ZK port changed in ZkTestServe Yeah - too easy to bump against a local ZooKeeper server with the default port, so I've switched it up.
        Hide
        Jason Rutherglen added a comment -

        The ZK port changed in ZkTestServer

        Show
        Jason Rutherglen added a comment - The ZK port changed in ZkTestServer
        Hide
        Jason Rutherglen added a comment -

        I did an svn update, though now am seeing the following error:

        java.util.concurrent.TimeoutException: Could not connect to ZooKeeper within 5000 ms
        at org.apache.solr.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:131)
        at org.apache.solr.cloud.SolrZkClient.<init>(SolrZkClient.java:106)
        at org.apache.solr.cloud.SolrZkClient.<init>(SolrZkClient.java:72)
        at org.apache.solr.cloud.CoreControllerTest.testCores(CoreControllerTest.java:48)

        Show
        Jason Rutherglen added a comment - I did an svn update, though now am seeing the following error: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper within 5000 ms at org.apache.solr.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:131) at org.apache.solr.cloud.SolrZkClient.<init>(SolrZkClient.java:106) at org.apache.solr.cloud.SolrZkClient.<init>(SolrZkClient.java:72) at org.apache.solr.cloud.CoreControllerTest.testCores(CoreControllerTest.java:48)
        Hide
        Jason Rutherglen added a comment -

        Need to have a command line tool that dumps the state of the
        existing cluster from ZK, out to a json file for a particular
        version.

        For my setup I'll have a program that'll look at this cluster
        state file and generate an input file that'll be written to ZK,
        which essentially instructs the Solr nodes to match the new
        cluster state. This allows me to easily write my own
        functionality that operates on the cluster that's external to
        deploying new software into Solr.

        Show
        Jason Rutherglen added a comment - Need to have a command line tool that dumps the state of the existing cluster from ZK, out to a json file for a particular version. For my setup I'll have a program that'll look at this cluster state file and generate an input file that'll be written to ZK, which essentially instructs the Solr nodes to match the new cluster state. This allows me to easily write my own functionality that operates on the cluster that's external to deploying new software into Solr.
        Hide
        Jason Rutherglen added a comment -

        If you know your going to not store file data at nodes
        that have children (the only way that downloading to a real file
        system makes sense), you could just call getChildren - if there
        are children, its a dir, otherwise its a file. Doesn't work for
        empty dirs, but you could also just do getData, and if it
        returns null, treat it as a dir, else treat it as a file.

        Thanks Mark...

        Show
        Jason Rutherglen added a comment - If you know your going to not store file data at nodes that have children (the only way that downloading to a real file system makes sense), you could just call getChildren - if there are children, its a dir, otherwise its a file. Doesn't work for empty dirs, but you could also just do getData, and if it returns null, treat it as a dir, else treat it as a file. Thanks Mark...
        Hide
        Mark Miller added a comment -

        Well, a path could be both a directory and a file with the zookeeper abstraction, which doesn't really work on a standard filesystem.

        If you know your going to not store file data at nodes that have children (the only way that downloading to a real file system makes sense), you could just call getChildren - if there are children, its a dir, otherwise its a file. Doesn't work for empty dirs, but you could also just do getData, and if it returns null, treat it as a dir, else treat it as a file.

        Show
        Mark Miller added a comment - Well, a path could be both a directory and a file with the zookeeper abstraction, which doesn't really work on a standard filesystem. If you know your going to not store file data at nodes that have children (the only way that downloading to a real file system makes sense), you could just call getChildren - if there are children, its a dir, otherwise its a file. Doesn't work for empty dirs, but you could also just do getData, and if it returns null, treat it as a dir, else treat it as a file.
        Hide
        Jason Rutherglen added a comment -

        Do we have some code that recursively downloads a tree of files from ZK? The challenge is I don't see a way to find out if a given path represents a directory or not.

        Show
        Jason Rutherglen added a comment - Do we have some code that recursively downloads a tree of files from ZK? The challenge is I don't see a way to find out if a given path represents a directory or not.
        Hide
        Jason Rutherglen added a comment -

        commons-lang-2.4.jar is required

        Show
        Jason Rutherglen added a comment - commons-lang-2.4.jar is required
        Hide
        Ted Dunning added a comment -

        ... I agree, I'm not really into ephemeral
        ZK nodes for Solr hosts/nodes. The reason is contact with ZK is
        highly superficial and can be intermittent.

        I have found that when I was having trouble with ZK connectivity, the problems were simply surfacing issues that I had anyway. You do have to configure the ZK client to not have long pauses (that is incompatible with SOLR how?) and you may need to adjust the timeouts on the ZK side. More importantly, any issues with ZK connectivity will have their parallels with any other heartbeat mechanism and replicating a heartbeat system that tries to match ZK for reliability is going to be a significant source of very nasty bugs. Better to not rewrite that already works. Keep in mind that ZK connection issues are not the same as session expiration. Katta has a fairly important set of bugfixes now to make that distinction and ZK will soon handle connection loss on its own.

        It isn't a bad idea to keep shards around for a while if a node goes down. That can seriously decrease the cost of momentary outages such as for a software upgrade. The idea is that when the node comes back, it can advertise availability of some shards and replication of those shards should cease.

        Show
        Ted Dunning added a comment - ... I agree, I'm not really into ephemeral ZK nodes for Solr hosts/nodes. The reason is contact with ZK is highly superficial and can be intermittent. I have found that when I was having trouble with ZK connectivity, the problems were simply surfacing issues that I had anyway. You do have to configure the ZK client to not have long pauses (that is incompatible with SOLR how?) and you may need to adjust the timeouts on the ZK side. More importantly, any issues with ZK connectivity will have their parallels with any other heartbeat mechanism and replicating a heartbeat system that tries to match ZK for reliability is going to be a significant source of very nasty bugs. Better to not rewrite that already works. Keep in mind that ZK connection issues are not the same as session expiration. Katta has a fairly important set of bugfixes now to make that distinction and ZK will soon handle connection loss on its own. It isn't a bad idea to keep shards around for a while if a node goes down. That can seriously decrease the cost of momentary outages such as for a software upgrade. The idea is that when the node comes back, it can advertise availability of some shards and replication of those shards should cease.
        Hide
        Jason Rutherglen added a comment -

        Here's the first cut... I agree, I'm not really into ephemeral
        ZK nodes for Solr hosts/nodes. The reason is contact with ZK is
        highly superficial and can be intermittent. I'm mostly concerned
        with insuring the core operations succeed on a given server. If
        a server goes down, there needs to be more than ZK to prove it,
        and if it goes down completely, I'll simply reallocate it's
        cores to another server using the core management mechanism
        provided in this patch.

        The issue is still being worked on, specifically the Solr server
        portion that downloads the cores from some location, or performs
        operations. The file format will move to json.

        Show
        Jason Rutherglen added a comment - Here's the first cut... I agree, I'm not really into ephemeral ZK nodes for Solr hosts/nodes. The reason is contact with ZK is highly superficial and can be intermittent. I'm mostly concerned with insuring the core operations succeed on a given server. If a server goes down, there needs to be more than ZK to prove it, and if it goes down completely, I'll simply reallocate it's cores to another server using the core management mechanism provided in this patch. The issue is still being worked on, specifically the Solr server portion that downloads the cores from some location, or performs operations. The file format will move to json.
        Hide
        Yonik Seeley added a comment -

        A somewhat secondary issue is whether the cluster master has to be involved in every query.

        Yeah, that's never been part of the plans AFAIK. In fact, in this first simple/short iteration, we have no master at all (or if there is one that can direct anything, it will be customer code).

        After trying several options in production, what I find is best is that the master lay down a statement of desired state and the nodes publish their status in a different and ephemeral fashion.

        Right - this is captured on the solr-cloud wiki with the ideas of "model" and "state". So far we're only dealing with state - reflecting what the current cluster looks like, and the details of how "model" type stuff (what state the nodes should strive for) hasn't been spelled out yet.

        Of course, this has hijacked Jason's issue... sorry!

        Show
        Yonik Seeley added a comment - A somewhat secondary issue is whether the cluster master has to be involved in every query. Yeah, that's never been part of the plans AFAIK. In fact, in this first simple/short iteration, we have no master at all (or if there is one that can direct anything, it will be customer code). After trying several options in production, what I find is best is that the master lay down a statement of desired state and the nodes publish their status in a different and ephemeral fashion. Right - this is captured on the solr-cloud wiki with the ideas of "model" and "state". So far we're only dealing with state - reflecting what the current cluster looks like, and the details of how "model" type stuff (what state the nodes should strive for) hasn't been spelled out yet. Of course, this has hijacked Jason's issue... sorry!
        Hide
        Ted Dunning added a comment -

        We actually started out that way... (when a node went down there wasn't really any trace it ever existed) but have been moving away from it.
        ZK may not just be a reflection of the cluster but may also control certain aspects of the cluster that you want persistent. For example, marking a node as "disabled" (i.e. don't use it). One could create APIs on the node to enable and disable and have that reflected in ZK, but it seems like more work than simply saying "change this znode".

        I see this as a conflation of two or three goals that leads to trouble. All of the goals are worthy and important, but the conflation of them leads to difficult problems. Taken separately, the goals are easily met.

        One goal is the reflection of current cluster state. That is most reliably done using ephemeral files roughly as I described.

        Another goal is the reflection of constraints or desired state of the cluster. This is best handled as you describe, with permanent files since you don't want this desired state to disappear when a node disappears.

        The real issue is making sure that the source of whatever information is most directly connected to the physical manifestation of that information. Moreover, it is important in some cases (node state, for instance) that the state stay correct even when the source of that state loses control by crashing, hanging or becoming otherwise indisposed. Inserting an intermediary into this chain of control is a bad idea. Replicating ZK's rather well implemented ephemeral state mechanism with ad hoc heartbeats is also a bad idea (remember how many bugs there have been in hadoop relative to heartbeats and the name node?).

        A somewhat secondary issue is whether the cluster master has to be involved in every query. That seems like a really bad bottleneck to me and Katta provides a proof of existence that this is not necessary.

        After trying several options in production, what I find is best is that the master lay down a statement of desired state and the nodes publish their status in a different and ephemeral fashion. The master can record a history or there may be general directions such as your disabled list however you like but that shouldn't be mixed into the node status because you otherwise get into a situation where ephemeral files can no longer be used for what they are good at.

        Show
        Ted Dunning added a comment - We actually started out that way... (when a node went down there wasn't really any trace it ever existed) but have been moving away from it. ZK may not just be a reflection of the cluster but may also control certain aspects of the cluster that you want persistent. For example, marking a node as "disabled" (i.e. don't use it). One could create APIs on the node to enable and disable and have that reflected in ZK, but it seems like more work than simply saying "change this znode". I see this as a conflation of two or three goals that leads to trouble. All of the goals are worthy and important, but the conflation of them leads to difficult problems. Taken separately, the goals are easily met. One goal is the reflection of current cluster state. That is most reliably done using ephemeral files roughly as I described. Another goal is the reflection of constraints or desired state of the cluster. This is best handled as you describe, with permanent files since you don't want this desired state to disappear when a node disappears. The real issue is making sure that the source of whatever information is most directly connected to the physical manifestation of that information. Moreover, it is important in some cases (node state, for instance) that the state stay correct even when the source of that state loses control by crashing, hanging or becoming otherwise indisposed. Inserting an intermediary into this chain of control is a bad idea. Replicating ZK's rather well implemented ephemeral state mechanism with ad hoc heartbeats is also a bad idea (remember how many bugs there have been in hadoop relative to heartbeats and the name node?). A somewhat secondary issue is whether the cluster master has to be involved in every query. That seems like a really bad bottleneck to me and Katta provides a proof of existence that this is not necessary. After trying several options in production, what I find is best is that the master lay down a statement of desired state and the nodes publish their status in a different and ephemeral fashion. The master can record a history or there may be general directions such as your disabled list however you like but that shouldn't be mixed into the node status because you otherwise get into a situation where ephemeral files can no longer be used for what they are good at.
        Hide
        Yonik Seeley added a comment -

        These are discussed here: http://oss.101tec.com/jira/browse/KATTA-43

        The basic design consideration is that failure of a node needs to automagically update the ZK state accordingly. This allows all important updates to files to go one direction as well.

        We actually started out that way... (when a node went down there wasn't really any trace it ever existed) but have been moving away from it.
        ZK may not just be a reflection of the cluster but may also control certain aspects of the cluster that you want persistent. For example, marking a node as "disabled" (i.e. don't use it). One could create APIs on the node to enable and disable and have that reflected in ZK, but it seems like more work than simply saying "change this znode".

        Show
        Yonik Seeley added a comment - These are discussed here: http://oss.101tec.com/jira/browse/KATTA-43 The basic design consideration is that failure of a node needs to automagically update the ZK state accordingly. This allows all important updates to files to go one direction as well. We actually started out that way... (when a node went down there wasn't really any trace it ever existed) but have been moving away from it. ZK may not just be a reflection of the cluster but may also control certain aspects of the cluster that you want persistent. For example, marking a node as "disabled" (i.e. don't use it). One could create APIs on the node to enable and disable and have that reflected in ZK, but it seems like more work than simply saying "change this znode".
        Hide
        Jason Rutherglen added a comment -

        This'll be a patch on the cloud branch to reuse what's started, I don't see any core management code in there yet, so this looks complimentary.

        Show
        Jason Rutherglen added a comment - This'll be a patch on the cloud branch to reuse what's started, I don't see any core management code in there yet, so this looks complimentary.
        Hide
        Jason Rutherglen added a comment -

        Ted,

        Thanks for the Katta link.

        This patch will likely de-emphasize the distributed search part,
        which is where the ephemeral node is used (i.e. a given server
        lists it's current state). I basically want to take care of this
        one little deployment aspect of cores, improving on the wacky
        hackedy system I'm running today. Then IF it works, then I'll
        look at the distributed search part, hopefully in a totally
        separate patch.

        Show
        Jason Rutherglen added a comment - Ted, Thanks for the Katta link. This patch will likely de-emphasize the distributed search part, which is where the ephemeral node is used (i.e. a given server lists it's current state). I basically want to take care of this one little deployment aspect of cores, improving on the wacky hackedy system I'm running today. Then IF it works, then I'll look at the distributed search part, hopefully in a totally separate patch.
        Hide
        Jason Rutherglen added a comment -

        Note to self: I need a way to upload an empty core/confdir from the command line, basically into ZK, then reference that core from ZK (I think this'll work?). I'd rather not rely on a separate http server or something... The size of a jared up Solr conf dir shouldn't be too much for ZK?

        Show
        Jason Rutherglen added a comment - Note to self: I need a way to upload an empty core/confdir from the command line, basically into ZK, then reference that core from ZK (I think this'll work?). I'd rather not rely on a separate http server or something... The size of a jared up Solr conf dir shouldn't be too much for ZK?
        Hide
        Ted Dunning added a comment -

        Katta had some interesting issues in the design of this.

        These are discussed here: http://oss.101tec.com/jira/browse/KATTA-43

        The basic design consideration is that failure of a node needs to automagically update the ZK state accordingly. This allows all important updates to files to go one direction as well.

        Show
        Ted Dunning added a comment - Katta had some interesting issues in the design of this. These are discussed here: http://oss.101tec.com/jira/browse/KATTA-43 The basic design consideration is that failure of a node needs to automagically update the ZK state accordingly. This allows all important updates to files to go one direction as well.
        Hide
        Jason Rutherglen added a comment -

        Additionally, upon successful completion of a core-version deployment to a set of nodes, then a customizable deletion policy like thing will be default, cleanup the old cores on the system.

        Show
        Jason Rutherglen added a comment - Additionally, upon successful completion of a core-version deployment to a set of nodes, then a customizable deletion policy like thing will be default, cleanup the old cores on the system.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jason Rutherglen
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development