Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.1
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      Wanted to make sure this doesn't get forgotten in the planning for 1.1. Paul Davis and I independently refactored couch_query_servers. Paul's work is much more comprehensive and includes a switch to emonk:

      http://github.com/davisp/couchdb/tree/emonk

      The work I did is here

      http://github.com/kocolosk/couchdb/tree/COUCHDB-901

      One feature not included in that branch is the ability to limit the number of OS processes. Should be simple to add if my work ends up being merged. I did the refactor because I was having problems with couch_query_servers "forgetting" about OS processes in BigCouch. One of the ets tables held by couch_query_servers would list thousands of processes (and in fact there were thousands of spawned couchjs), but another table would claim that only two were running. After digging through the code a while I became frustrated with all of the tracking of multiple ets tables and rewrote a server that used only one table. Other changes include

      • ability to reuse an OS process when the client that requested it dies.
      • better behavior under config changes - doesn't kill all query servers when [query_servers] or [native_query_servers] block changes

        Activity

        Hide
        Paul Joseph Davis added a comment -

        Its not a full switch to emonk, just a refactor to allow supporting emonk as well as some other bits to make the various server process handling less funky.

        The other major thing I did was to split couch_query_servers into two sets of modules. couch_view_server for map/reduce views, and couch_app_server for everything else. Each set of modules has an API module and a set of implementation modules. For instance:

        couch_view_server.erl
        couch_view_server_emonk.erl
        couch_view_server_erlang.erl
        couch_view_server_os.erl

        All calls from external code go to couch_view_server that returns an opaque term that ends up calling out to the actual implementation module.

        The main things missing from my code are finishing the implementations for the non-emonk app servers and some refactoring in a few places so that things like couch_view_server_os.erl and couch_app_server_os.erl share the same code for managing OS level processes. The couch_proc_manager.erl in Adam's patch looks like exactly what I need for this so that'll help.

        Show
        Paul Joseph Davis added a comment - Its not a full switch to emonk, just a refactor to allow supporting emonk as well as some other bits to make the various server process handling less funky. The other major thing I did was to split couch_query_servers into two sets of modules. couch_view_server for map/reduce views, and couch_app_server for everything else. Each set of modules has an API module and a set of implementation modules. For instance: couch_view_server.erl couch_view_server_emonk.erl couch_view_server_erlang.erl couch_view_server_os.erl All calls from external code go to couch_view_server that returns an opaque term that ends up calling out to the actual implementation module. The main things missing from my code are finishing the implementations for the non-emonk app servers and some refactoring in a few places so that things like couch_view_server_os.erl and couch_app_server_os.erl share the same code for managing OS level processes. The couch_proc_manager.erl in Adam's patch looks like exactly what I need for this so that'll help.
        Hide
        Adam Kocoloski added a comment -

        I added a feature to the couch_proc_manager in my github branch, a "soft limit" on the number of OS processes. In contrast to the code in trunk, which delays requests for OS processes, couch_proc_manager will continue to spawn new processes, but it will only reuse the ones that are returned if the number of active processes is below the soft limit.

        I can see the benefit to a hard limit, but in general I feel pretty uneasy about blocking short requests for an arbitrary amount of time. The view updaters in particular will hold onto an OS process for a very long time. In my opinion a hard limit is probably most useful to protect against malicious intent.

        Show
        Adam Kocoloski added a comment - I added a feature to the couch_proc_manager in my github branch, a "soft limit" on the number of OS processes. In contrast to the code in trunk, which delays requests for OS processes, couch_proc_manager will continue to spawn new processes, but it will only reuse the ones that are returned if the number of active processes is below the soft limit. I can see the benefit to a hard limit, but in general I feel pretty uneasy about blocking short requests for an arbitrary amount of time. The view updaters in particular will hold onto an OS process for a very long time. In my opinion a hard limit is probably most useful to protect against malicious intent.
        Hide
        Paul Joseph Davis added a comment -

        I wonder if we should refactor the view engine's use of external OS processes so that view generation can intermingle with other types of requests. I know we've avoided it in the past, but it's a possible fix for the current issue if we're going to have limits on the number of os processes.

        Show
        Paul Joseph Davis added a comment - I wonder if we should refactor the view engine's use of external OS processes so that view generation can intermingle with other types of requests. I know we've avoided it in the past, but it's a possible fix for the current issue if we're going to have limits on the number of os processes.
        Hide
        Adam Kocoloski added a comment -

        I suppose it would work to return the os process and re-acquire it after every iteration of do_maps or do_writes, but i'm sure reinitializing the OS proc every time will add significant delays to the indexing.

        Show
        Adam Kocoloski added a comment - I suppose it would work to return the os process and re-acquire it after every iteration of do_maps or do_writes, but i'm sure reinitializing the OS proc every time will add significant delays to the indexing.
        Hide
        Paul Joseph Davis added a comment -

        I was thinking more along the lines of how we do the current cacheing of design doc functions. And when the view updater grabs a process, the code handling out os processes attempts to return it an already initialized process. I reckon there would be overhead to that still though, just not sure on how much.

        Show
        Paul Joseph Davis added a comment - I was thinking more along the lines of how we do the current cacheing of design doc functions. And when the view updater grabs a process, the code handling out os processes attempts to return it an already initialized process. I reckon there would be overhead to that still though, just not sure on how much.
        Hide
        Jan Lehnardt added a comment -

        What is the state of this? Are we still planning to have this land for 1.1?

        Show
        Jan Lehnardt added a comment - What is the state of this? Are we still planning to have this land for 1.1?
        Hide
        Adam Kocoloski added a comment -

        My contribution is non-essential for 1.1. I haven't heard any reports of CouchDB "losing track" of OS processes outside of BigCouch, so it may be a quirk of the rather aggressive way that BigCouch creates transient clients which request OS processes.

        Show
        Adam Kocoloski added a comment - My contribution is non-essential for 1.1. I haven't heard any reports of CouchDB "losing track" of OS processes outside of BigCouch, so it may be a quirk of the rather aggressive way that BigCouch creates transient clients which request OS processes.
        Hide
        Paul Joseph Davis added a comment -

        I don't see a reason for this to block 1.1 either.

        Show
        Paul Joseph Davis added a comment - I don't see a reason for this to block 1.1 either.
        Hide
        Jan Lehnardt added a comment -

        Raising for 1.2.0

        Show
        Jan Lehnardt added a comment - Raising for 1.2.0
        Hide
        Jan Lehnardt added a comment -

        Bump to 1.3.x.

        Show
        Jan Lehnardt added a comment - Bump to 1.3.x.

          People

          • Assignee:
            Unassigned
            Reporter:
            Adam Kocoloski
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development