CouchDB
  1. CouchDB
  2. COUCHDB-1010

improve supervision tree and config changes reload

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2
    • Fix Version/s: 1.2
    • Component/s: Database Core
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      port from bigcouch which is under apache license 2.

      • improve supervision
      • clean config changes handling.

      This has been tested in an undisclosed yet project named refuge wich is also based on couchdb but is more OTP and rebar friendly.

      1. COUCHDB-1010.patch
        16 kB
        Benoit Chesneau
      2. 0001-export-functions-to-couch_config.patch
        6 kB
        Benoit Chesneau
      3. 0002-another-port-from-bigcouch.-i.patch
        12 kB
        Benoit Chesneau
      4. 0002-another-port-from-bigcouch.-i.patch
        13 kB
        Benoit Chesneau
      5. 0003-do-like-in-other-module.-s-couch_os_daemons-MODULE.patch
        0.8 kB
        Benoit Chesneau
      6. 0002-another-port-from-bigcouch.-i.patch
        12 kB
        Benoit Chesneau

        Activity

        Hide
        Paul Joseph Davis added a comment -

        There are two issues being addressed by this patch that are both fine, but should probably be in two patches so that the history is a bit more clear on what's going on.

        The first issue of moving around the supervision tree is fine and looks good to me though I'm not super knowledgeable on all the finer intricacies of OTP so I could be missing something.

        The second bit has to do with only passing exported functions to couch_config. I think the changes for this are good (though not comprehensive). But I think the parts of the patch dealing with the config system would be better as a standalone patch. I realize that this might mean making one patch for the config system and then moving some of that code with a second patch for the supervision tree, but the clarity in the history would be well served by signaling our intent here.

        Also, there's a hunk that introduces a typo into the license header in couch_server_sup.erl:

        -% License for the specific language governing permissions and limitations under
        +% License for the specific languag governing permissions and limitations under

        Other than that, LGTM.

        Show
        Paul Joseph Davis added a comment - There are two issues being addressed by this patch that are both fine, but should probably be in two patches so that the history is a bit more clear on what's going on. The first issue of moving around the supervision tree is fine and looks good to me though I'm not super knowledgeable on all the finer intricacies of OTP so I could be missing something. The second bit has to do with only passing exported functions to couch_config. I think the changes for this are good (though not comprehensive). But I think the parts of the patch dealing with the config system would be better as a standalone patch. I realize that this might mean making one patch for the config system and then moving some of that code with a second patch for the supervision tree, but the clarity in the history would be well served by signaling our intent here. Also, there's a hunk that introduces a typo into the license header in couch_server_sup.erl: -% License for the specific language governing permissions and limitations under +% License for the specific languag governing permissions and limitations under Other than that, LGTM.
        Hide
        Benoit Chesneau added a comment - - edited

        i will split the patch in afternoon. Also thanks for the review.

        Show
        Benoit Chesneau added a comment - - edited i will split the patch in afternoon. Also thanks for the review.
        Hide
        Benoit Chesneau added a comment -

        [PATCH 1/2] export functions to couch_config.

        Show
        Benoit Chesneau added a comment - [PATCH 1/2] export functions to couch_config.
        Hide
        Benoit Chesneau added a comment -

        [PATCH 2/2] another port from bigcouch. i

        • Split supervision in different module.
        • Handle icu driver in its own server so we can reload it when config
          change without reloading all the supervision.
        Show
        Benoit Chesneau added a comment - [PATCH 2/2] another port from bigcouch. i Split supervision in different module. Handle icu driver in its own server so we can reload it when config change without reloading all the supervision.
        Hide
        Benoit Chesneau added a comment -

        I've splitted previous patch in two. It also fixes typo and add missing license header.

        Show
        Benoit Chesneau added a comment - I've splitted previous patch in two. It also fixes typo and add missing license header.
        Hide
        Paul Joseph Davis added a comment -

        Looks better. For the config patch, did I already make the os daemons and httpd proxy use the exported style?

        The supervision patch looks good except that there's a bug in the config_change handler for the ICU driver directory. It attempts to ping the driver gen_server with a reload_driver message but the driver gen_server just ignores any info messages. Either we should just kill it and let the supervisor restart it or alternatively give it a reload handler. If we do go with a reload, I'd also prefer using gen_server:cast/2 instead of raw message passing for aesthetics.

        Also, for the commit message in passing exported functions to couch_config, please make a note that this is important because using anonymous functions breaks code reloading.

        Show
        Paul Joseph Davis added a comment - Looks better. For the config patch, did I already make the os daemons and httpd proxy use the exported style? The supervision patch looks good except that there's a bug in the config_change handler for the ICU driver directory. It attempts to ping the driver gen_server with a reload_driver message but the driver gen_server just ignores any info messages. Either we should just kill it and let the supervisor restart it or alternatively give it a reload handler. If we do go with a reload, I'd also prefer using gen_server:cast/2 instead of raw message passing for aesthetics. Also, for the commit message in passing exported functions to couch_config, please make a note that this is important because using anonymous functions breaks code reloading.
        Hide
        Benoit Chesneau added a comment -

        mmm right it's ignored. I will replace by a cast function. for os daemons since they aren't yet in bigcouch, i didn't ported the changes. I will make a third patch for that.

        • benoît
        Show
        Benoit Chesneau added a comment - mmm right it's ignored. I will replace by a cast function. for os daemons since they aren't yet in bigcouch, i didn't ported the changes. I will make a third patch for that. benoît
        Hide
        Benoit Chesneau added a comment -

        PATCH 2/3] another port from bigcouch. i

        • Split supervision in different module.
        • Handle icu driver in its own server so we can reload it when config
          change without reloading all the supervision.

        edited patch to handler driver reload. The tests show the driver is always loaded but that don't hurt.

        Show
        Benoit Chesneau added a comment - PATCH 2/3] another port from bigcouch. i Split supervision in different module. Handle icu driver in its own server so we can reload it when config change without reloading all the supervision. edited patch to handler driver reload. The tests show the driver is always loaded but that don't hurt.
        Hide
        Benoit Chesneau added a comment -

        config_changes was always exported in os_daemons module. Little change.

        Show
        Benoit Chesneau added a comment - config_changes was always exported in os_daemons module. Little change.
        Hide
        Paul Joseph Davis added a comment -

        Why have you duplicated the code between init/1 and handle_cast/2?

        Also, did you have an example for this reloading code? It doesn't look correct to me. Reading the erl_ddll page it looks like there are quite a few ways for the driver reload to go awry in ways that aren't warm and fuzzy. Reading through the man page and given the way we store the port in process dicts, it looks like we should be opening the drier with the kill_ports option and then force reloading in the try_load. Also, with the monitor option we should probably setup a timeout to ensure that the we get the actual message back in some amount of time.

        Show
        Paul Joseph Davis added a comment - Why have you duplicated the code between init/1 and handle_cast/2? Also, did you have an example for this reloading code? It doesn't look correct to me. Reading the erl_ddll page it looks like there are quite a few ways for the driver reload to go awry in ways that aren't warm and fuzzy. Reading through the man page and given the way we store the port in process dicts, it looks like we should be opening the drier with the kill_ports option and then force reloading in the try_load. Also, with the monitor option we should probably setup a timeout to ensure that the we get the actual message back in some amount of time.
        Hide
        Paul Joseph Davis added a comment -

        Adam pointed out on IRC that I'm starting to go down a bit of a rabbit hole to get the driver reloading to actually work. Reading the docs on driver reloading it appears to be something that would be quite difficult to get right because of how we have long lived ports open to the driver amongst other things.

        Seeing as the driver code is relatively stable we came to the conclusion that the answer to this particular problem may be to just not even attempt the reload for now. Just leave the gen_server with its initial load and ignore config changes for the driver. If someone needs to reload in the future for some insane reason, then they can work through the details of try_load and friends and what happens to processes with open ports and such forth.

        Show
        Paul Joseph Davis added a comment - Adam pointed out on IRC that I'm starting to go down a bit of a rabbit hole to get the driver reloading to actually work. Reading the docs on driver reloading it appears to be something that would be quite difficult to get right because of how we have long lived ports open to the driver amongst other things. Seeing as the driver code is relatively stable we came to the conclusion that the answer to this particular problem may be to just not even attempt the reload for now. Just leave the gen_server with its initial load and ignore config changes for the driver. If someone needs to reload in the future for some insane reason, then they can work through the details of try_load and friends and what happens to processes with open ports and such forth.
        Hide
        Benoit Chesneau added a comment - - edited

        I didn't duplicate. one is loading load is trying to load or reload.

        No like you I read the man. Reading this doc, it seem that using pending option in try_load imply the kill_ports option:

        [[[
        The reload option can be either the atom pending, in which reloading is requested for any driver and will be effectuated when all ports opened against the driver are closed. The replacement of the driver will in this case take place regardless of if there are still pending users having the driver loaded! The option also triggers port-killing (if the kill_ports driver option is used) even though there are pending users, making it usable for forced driver replacement,
        ]]]

        Not sure if in this case we need to set the option then. Do you mean using the same code in init and reload ? I wonder anyway if we should cast, maybe we should call here and just rely on this timeout.

        Show
        Benoit Chesneau added a comment - - edited I didn't duplicate. one is loading load is trying to load or reload. No like you I read the man. Reading this doc, it seem that using pending option in try_load imply the kill_ports option: [[[ The reload option can be either the atom pending, in which reloading is requested for any driver and will be effectuated when all ports opened against the driver are closed. The replacement of the driver will in this case take place regardless of if there are still pending users having the driver loaded! The option also triggers port-killing (if the kill_ports driver option is used) even though there are pending users, making it usable for forced driver replacement, ]]] Not sure if in this case we need to set the option then. Do you mean using the same code in init and reload ? I wonder anyway if we should cast, maybe we should call here and just rely on this timeout.
        Hide
        Benoit Chesneau added a comment -

        ah your second comment just came. That what I guessed because the reload wasn't handled. The main problem in driver reload is to handle multiple loading. I think reading the doc we should try to unload first then reload.

        Show
        Benoit Chesneau added a comment - ah your second comment just came. That what I guessed because the reload wasn't handled. The main problem in driver reload is to handle multiple loading. I think reading the doc we should try to unload first then reload.
        Hide
        Paul Joseph Davis added a comment -

        Yeah, just got accosted by Koco for being a stickler.

        I'm leaning fairly far in favor of just abandoning attempts to handle reloading. I read through the man pages and there are quite a few caveats to how this stuff is supposed to work. Rather than guess about it and then try and write some sort of awkward tests for it, I'd just as soon say no, especially considring I don't foresee it even being used.

        Show
        Paul Joseph Davis added a comment - Yeah, just got accosted by Koco for being a stickler. I'm leaning fairly far in favor of just abandoning attempts to handle reloading. I read through the man pages and there are quite a few caveats to how this stuff is supposed to work. Rather than guess about it and then try and write some sort of awkward tests for it, I'd just as soon say no, especially considring I don't foresee it even being used.
        Hide
        Benoit Chesneau added a comment -

        updated patch following our discussion on irc. reload wasn't working. So remove the code.

        Show
        Benoit Chesneau added a comment - updated patch following our discussion on irc. reload wasn't working. So remove the code.
        Hide
        Paul Joseph Davis added a comment -

        Looks good to me.

        Show
        Paul Joseph Davis added a comment - Looks good to me.
        Hide
        Benoit Chesneau added a comment -

        If no one comment against, I will commit it later today.

        Show
        Benoit Chesneau added a comment - If no one comment against, I will commit it later today.
        Hide
        Paul Joseph Davis added a comment -

        Looks good to me.

        Show
        Paul Joseph Davis added a comment - Looks good to me.
        Hide
        Adam Kocoloski added a comment -

        I'd like to take one last look first, will do so shortly.

        Show
        Adam Kocoloski added a comment - I'd like to take one last look first, will do so shortly.
        Hide
        Benoit Chesneau added a comment -

        np, let me know

        Show
        Benoit Chesneau added a comment - np, let me know
        Hide
        Adam Kocoloski added a comment -

        Sorry for the delay. I've finished reviewing the 0001 patch. Aside from the little whitespace error introduced in couch_db_update_notifier_sup, my only complaint is about the couch_server_sup handler. Calling exit(whereis(couch_server_sup), shutdown) does nothing at all when the server is started by the application master (which is what happens when CouchDB is started using the shell script). Curiously, it does shut down the server when called from the etap tests, which have a lower-level way of starting couch.

        I propose we fix couch_server_sup:stop() separately because it may entail a major reworking of the etap tests. Here's an alternative patch. I think it's a bit more fine-grained, too, in that it only restarts couch_secondary_services, not the whole tree, when the "daemons" block changes.

        https://github.com/kocolosk/couchdb/commit/1778db

        Show
        Adam Kocoloski added a comment - Sorry for the delay. I've finished reviewing the 0001 patch. Aside from the little whitespace error introduced in couch_db_update_notifier_sup, my only complaint is about the couch_server_sup handler. Calling exit(whereis(couch_server_sup), shutdown) does nothing at all when the server is started by the application master (which is what happens when CouchDB is started using the shell script). Curiously, it does shut down the server when called from the etap tests, which have a lower-level way of starting couch. I propose we fix couch_server_sup:stop() separately because it may entail a major reworking of the etap tests. Here's an alternative patch. I think it's a bit more fine-grained, too, in that it only restarts couch_secondary_services, not the whole tree, when the "daemons" block changes. https://github.com/kocolosk/couchdb/commit/1778db
        Hide
        Adam Kocoloski added a comment -

        Ok, I finished the reviewing the 0002 patch. Made a few small corrections, and also removed the couch_util:start_driver/1 function and updated the ICU driver etap test.

        https://github.com/kocolosk/couchdb/commit/d0aa33

        The third patch is a trivial one, so from my perspective I think these changes are ready to be committed. I'll rebase my 1010-config-change-supervision-tree onto trunk and do so later today. Thanks Benoit for getting this started.

        Show
        Adam Kocoloski added a comment - Ok, I finished the reviewing the 0002 patch. Made a few small corrections, and also removed the couch_util:start_driver/1 function and updated the ICU driver etap test. https://github.com/kocolosk/couchdb/commit/d0aa33 The third patch is a trivial one, so from my perspective I think these changes are ready to be committed. I'll rebase my 1010-config-change-supervision-tree onto trunk and do so later today. Thanks Benoit for getting this started.
        Hide
        Adam Kocoloski added a comment -

        Committed to trunk

        Show
        Adam Kocoloski added a comment - Committed to trunk

          People

          • Assignee:
            Adam Kocoloski
            Reporter:
            Benoit Chesneau
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development