CouchDB
  1. CouchDB
  2. COUCHDB-1309

File descriptor leaks on design document update and view cleanup

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: None
    • Labels:
      None

      Description

      If we add a design document with views defined in it, open the corresponding view group (by querying one of its views for e.g.), then update the design document in such a way that the view signature changes (changing a view's map function code for e.g), the old view group remains open forever (unless a server restart happens) and keeps it's view file reference counter active forever.
      If a view cleanup request comes, the old view file is not deleted since the reference counter is not dropped by the old view group, keeping the file descriptor in use forever.

      This leakage is different from what is reported in COUCHDB-1129 but it's somehow related.

      The attached patch, simply shutdowns a view group when the design document is updated and the new view signature changes, releasing the old view file descriptor (as soon as no more clients are using the old view).

      1. 0001-Fix-ets-lookup-case_clause-error-after-ddoc-is-updat.patch
        2 kB
        Filipe Manana
      2. master-0002-Shutdown-view-group-on-ddoc-update.patch
        12 kB
        Filipe Manana
      3. master-0002-Shutdown-view-group-on-ddoc-update.patch
        14 kB
        Filipe Manana
      4. master-0001-Add-ddoc_updated-event.patch
        4 kB
        Filipe Manana
      5. 12x-0002-Shutdown-view-group-on-ddoc-update.patch
        13 kB
        Filipe Manana
      6. 12x-0001-Add-ddoc_updated-event.patch
        4 kB
        Filipe Manana
      7. couchdb-1309_12x.patch
        12 kB
        Filipe Manana
      8. couchdb-1309_trunk.patch
        13 kB
        Filipe Manana

        Issue Links

          Activity

          Hide
          Filipe Manana added a comment -

          the patches for trunk and 1.2.x/1.1.x

          Show
          Filipe Manana added a comment - the patches for trunk and 1.2.x/1.1.x
          Hide
          Benoit Chesneau added a comment -
          Show
          Benoit Chesneau added a comment - I'm not sure it need to be asynchronous. Couldn't we test it here : http://git-wip-us.apache.org/repos/asf?p=couchdb.git;a=blob;f=src/couch_index/src/couch_index_updater.erl;h=853f3d11130c903650450e97b23a692762b3bba1;hb=HEAD#l118 and then crash the updater?
          Hide
          Filipe Manana added a comment -

          Not sure what you mean:

          "I'm not sure it need to be asynchronous. Couldn't we test it here : "

          It's synchronous to avoid getting the process' mailbox flooded in the unlikely case there's a high rate of updates to the same ddoc.
          And the updater shouldn't be crashed. If it's running it means there are clients waiting for it before folding the view. Those clients should not get an error.

          Show
          Filipe Manana added a comment - Not sure what you mean: "I'm not sure it need to be asynchronous. Couldn't we test it here : " It's synchronous to avoid getting the process' mailbox flooded in the unlikely case there's a high rate of updates to the same ddoc. And the updater shouldn't be crashed. If it's running it means there are clients waiting for it before folding the view. Those clients should not get an error.
          Hide
          Benoit Chesneau added a comment - - edited

          using the db update notifier is an asynchronous process for me ...

          By crash I mean in the OTP way. ie crash the updater then the manager restart one with new ddoc. That what you do anyway but it could be done directly in the updater, the manager should take care of such event.

          Show
          Benoit Chesneau added a comment - - edited using the db update notifier is an asynchronous process for me ... By crash I mean in the OTP way. ie crash the updater then the manager restart one with new ddoc. That what you do anyway but it could be done directly in the updater, the manager should take care of such event.
          Hide
          Filipe Manana added a comment -

          Benoit, what I meant by synchronous is the handle_call vs handle_cast in couch_index.erl.

          I'm afraid you're missing some details about how the view system works. There's more than the updater process that depends on the design document properties and view signature, like couch_index_server.
          Spawning a new group when the ddoc signature changes is not a new behaviour I'm adding - it was always like this afaik.
          I'm concentrated on fixing this particular issue and not redesigning a big part of the view system. Feel free to do it and provide a working patch.

          Show
          Filipe Manana added a comment - Benoit, what I meant by synchronous is the handle_call vs handle_cast in couch_index.erl. I'm afraid you're missing some details about how the view system works. There's more than the updater process that depends on the design document properties and view signature, like couch_index_server. Spawning a new group when the ddoc signature changes is not a new behaviour I'm adding - it was always like this afaik. I'm concentrated on fixing this particular issue and not redesigning a big part of the view system. Feel free to do it and provide a working patch.
          Hide
          Benoit Chesneau added a comment -

          Or maybe I'm not. I'm not speaking about redesigning how the view system works neither I don't want to fix this issue and I'm not sure why you are suggesting that.

          I'm trying to remove the need of this another asynchronous process. You don't really need it. Once the updater detect that this particular design doc has changed (and it will) it can simply send a synchronous signal to its parent. then the parent will do exactly the same you are doing in your patch. That's all. I don't see why it couldn't work. I will have a closer look later on it.

          Show
          Benoit Chesneau added a comment - Or maybe I'm not. I'm not speaking about redesigning how the view system works neither I don't want to fix this issue and I'm not sure why you are suggesting that. I'm trying to remove the need of this another asynchronous process. You don't really need it. Once the updater detect that this particular design doc has changed (and it will) it can simply send a synchronous signal to its parent. then the parent will do exactly the same you are doing in your patch. That's all. I don't see why it couldn't work. I will have a closer look later on it.
          Hide
          Filipe Manana added a comment -

          Yes, that's assuming the updater will run again after the design document update. What if it doesn't? You still end up in the current situation.

          Show
          Filipe Manana added a comment - Yes, that's assuming the updater will run again after the design document update. What if it doesn't? You still end up in the current situation.
          Hide
          Benoit Chesneau added a comment - - edited

          Sending a synchronous call doesn't mean it needs to stay alive after that, just that we send a message to the parent synchronously before dying. On the other hand it allows to not spawn another process. A custom exit signal could be enough.

          Show
          Benoit Chesneau added a comment - - edited Sending a synchronous call doesn't mean it needs to stay alive after that, just that we send a message to the parent synchronously before dying. On the other hand it allows to not spawn another process. A custom exit signal could be enough.
          Hide
          Paul Joseph Davis added a comment -

          Hrm. Good catch Filipe.

          I haven't done more than skim the patch, but I'm wondering if it might be easier to just shutdown the view group when it sees the new design doc during an update instead of adding all of the mechanics to listen for update notifications. I'll read through this more when I get done figuring out how deb packages work.

          Show
          Paul Joseph Davis added a comment - Hrm. Good catch Filipe. I haven't done more than skim the patch, but I'm wondering if it might be easier to just shutdown the view group when it sees the new design doc during an update instead of adding all of the mechanics to listen for update notifications. I'll read through this more when I get done figuring out how deb packages work.
          Hide
          Benoit Chesneau added a comment -

          @davisp mostly what I had in head.

          Show
          Benoit Chesneau added a comment - @davisp mostly what I had in head.
          Hide
          Filipe Manana added a comment -

          Benoit, you can't rely on the updater to detect that the ddoc changed.
          Maybe I wasn't clear enough before. Imagine the following scenario:

          1) Create a ddoc
          2) Query one of its views
          3) Update the ddoc

          If all subsequent view query requests arrive after the update (3), they will get routed to the new view group - therefore the old one will not get its updater running again and will stay alive forever.
          Same issue happens if when the update happens clients who get "into" the old view group are querying only with ?stale=ok.

          If you're so convinced that doing it in the updater works for these 2 cases, please provide a working code prototype.

          Show
          Filipe Manana added a comment - Benoit, you can't rely on the updater to detect that the ddoc changed. Maybe I wasn't clear enough before. Imagine the following scenario: 1) Create a ddoc 2) Query one of its views 3) Update the ddoc If all subsequent view query requests arrive after the update (3), they will get routed to the new view group - therefore the old one will not get its updater running again and will stay alive forever. Same issue happens if when the update happens clients who get "into" the old view group are querying only with ?stale=ok. If you're so convinced that doing it in the updater works for these 2 cases, please provide a working code prototype.
          Hide
          Paul Joseph Davis added a comment -

          Filipe,

          Good call. That code I'm remember is probably from before we used signatures for view groups instead of just the design name.

          Show
          Paul Joseph Davis added a comment - Filipe, Good call. That code I'm remember is probably from before we used signatures for view groups instead of just the design name.
          Hide
          Benoit Chesneau added a comment -

          I'm pretty sure things could be handled with a less offensive tone. Thanks. Tickets are here to let other review.

          Like I said I will have a closer look later. I don't think we need another asynchronous process and more I'm not sure for a performance reason that the change in update_notifier is really wanted.

          Even if we take this patch the notifier may need to be monitored, etc.

          Show
          Benoit Chesneau added a comment - I'm pretty sure things could be handled with a less offensive tone. Thanks. Tickets are here to let other review. Like I said I will have a closer look later . I don't think we need another asynchronous process and more I'm not sure for a performance reason that the change in update_notifier is really wanted. Even if we take this patch the notifier may need to be monitored, etc.
          Hide
          Paul Joseph Davis added a comment -

          @Filipe

          After reading further I think you've probably got the best approach. The only alternative I've contemplated would be to have a monitor process that wakes up every so often and checks that its sig is still valid which I don't think is as clean as this version.

          Although, I would like to see this patch as at least two if not more commits. Specifically, at least one for the new ddoc event broadcasting and then one for the indexer/view stuff.

          Show
          Paul Joseph Davis added a comment - @Filipe After reading further I think you've probably got the best approach. The only alternative I've contemplated would be to have a monitor process that wakes up every so often and checks that its sig is still valid which I don't think is as clean as this version. Although, I would like to see this patch as at least two if not more commits. Specifically, at least one for the new ddoc event broadcasting and then one for the indexer/view stuff.
          Hide
          Benoit Chesneau added a comment -

          Ok, the update detection can't happen in updater, if this isn't triggered.

          Anyway I think we could remove this other listener by reusing the update notification process in couch_index_server, something like (not tested):

          http://friendpaste.com/7CwKM5clixdyuq6HWHjDDv

          I'm also not sure we need to know the exact design id, isn't emitting an 'refresh_indexes' event enough since we have all the info in states to detect signature change if it happends? It woud remove this list in memory and extra event notifications. Something like :

          http://friendpaste.com/44q84nzKSjaU15vX2sVkfF

          I will try to send a complete patch later in the day, but not sure I will have time. Btw I'm only worrying about the trunk here. I think we have some time to solve this problem on trunk. No need for a quick fix.

          Show
          Benoit Chesneau added a comment - Ok, the update detection can't happen in updater, if this isn't triggered. Anyway I think we could remove this other listener by reusing the update notification process in couch_index_server, something like (not tested): http://friendpaste.com/7CwKM5clixdyuq6HWHjDDv I'm also not sure we need to know the exact design id, isn't emitting an 'refresh_indexes' event enough since we have all the info in states to detect signature change if it happends? It woud remove this list in memory and extra event notifications. Something like : http://friendpaste.com/44q84nzKSjaU15vX2sVkfF I will try to send a complete patch later in the day, but not sure I will have time. Btw I'm only worrying about the trunk here. I think we have some time to solve this problem on trunk. No need for a quick fix.
          Hide
          Jan Lehnardt added a comment -

          @Benoit, wouldn't this send a notification to every updater in every database every time a single design doc gets updated?

          I agree it is less code, but it sure sounds like a very entangled result when a ddoc in database A changes and the updater for a ddoc in database B receives the notification. Especially in a situation with a lot of databases (e.g. a db per user) and design docs in each.

          Show
          Jan Lehnardt added a comment - @Benoit, wouldn't this send a notification to every updater in every database every time a single design doc gets updated? I agree it is less code, but it sure sounds like a very entangled result when a ddoc in database A changes and the updater for a ddoc in database B receives the notification. Especially in a situation with a lot of databases (e.g. a db per user) and design docs in each.
          Hide
          Filipe Manana added a comment - - edited

          Benoit, when I asked you to provide sample code is just because I wasn't understand your point. Not meant for you to take it as an offence, but as a better/more clear way to express your ideas.

          Regarding the first solution, as Jan pointed out, your notifying every index about every ddoc update.
          To avoid this, one of the ets tables (?BY_DB maybe) should also store the ID of the ddoc associated to the indexer.

          Show
          Filipe Manana added a comment - - edited Benoit, when I asked you to provide sample code is just because I wasn't understand your point. Not meant for you to take it as an offence, but as a better/more clear way to express your ideas. Regarding the first solution, as Jan pointed out, your notifying every index about every ddoc update. To avoid this, one of the ets tables (?BY_DB maybe) should also store the ID of the ddoc associated to the indexer.
          Hide
          Paul Joseph Davis added a comment -

          I've been thinking about this and it occurs to me that adding the new function to the index seems like the wrong way to go about this. Seems to me that couch_index_server and couch_index should take care of all of this without requiring each index to reimplement this simple code. I'll take a quick pass through later to try and sketch out what i mean.

          Show
          Paul Joseph Davis added a comment - I've been thinking about this and it occurs to me that adding the new function to the index seems like the wrong way to go about this. Seems to me that couch_index_server and couch_index should take care of all of this without requiring each index to reimplement this simple code. I'll take a quick pass through later to try and sketch out what i mean.
          Hide
          Jan Lehnardt added a comment -

          @Paul, sounds like an optimisation to me, would it make sense to get Filipe's patch in and then open a new ticket for your approach?

          Show
          Jan Lehnardt added a comment - @Paul, sounds like an optimisation to me, would it make sense to get Filipe's patch in and then open a new ticket for your approach?
          Hide
          Filipe Manana added a comment -

          Updated patches to reuse the update notifier from couch_view (1.2.x) / couch_index_server (master)

          Show
          Filipe Manana added a comment - Updated patches to reuse the update notifier from couch_view (1.2.x) / couch_index_server (master)
          Hide
          Paul Joseph Davis added a comment -

          So reviewing this again, my last comment still stands. There's no reason to introduce the ddoc_changed function into the indexer API. These changes should be limited entirely to couch_index (minus the tests anyway). This is a simple fix. Just use Mod:get(signature, Mod:init(Db, DDoc)) (or similar) in the with_db call in couch_index:handle_cast/2 call for ddoc_updated.

          The other thing that occurred to me over lunch is if we might want to reimplement this by just having couch_index close itself after a period of inactivity. At the moment there's no limit on the number of groups that are opened like we have for databases. Some sort of mechanism to close these things down after lack of use seems like a good idea. The patch to do this would be pretty trivial. All you have to do is add a Timeout to each reply/noreply response in the couch_index gen_server, and then have a handle_info(timeout) clause that returns

          {stop, normal, State}

          .

          Other than that, I've heard other people wanting the update notifications for design docs so that could go in regardless. Another thing that we might want to think about is best practices for commits and whether commits should span applications. I don't have super strong feelings either way though I have heard the argument that not spanning applications decreases the chance that code starts getting super entangled.

          Good work, Filipe

          Show
          Paul Joseph Davis added a comment - So reviewing this again, my last comment still stands. There's no reason to introduce the ddoc_changed function into the indexer API. These changes should be limited entirely to couch_index (minus the tests anyway). This is a simple fix. Just use Mod:get(signature, Mod:init(Db, DDoc)) (or similar) in the with_db call in couch_index:handle_cast/2 call for ddoc_updated. The other thing that occurred to me over lunch is if we might want to reimplement this by just having couch_index close itself after a period of inactivity. At the moment there's no limit on the number of groups that are opened like we have for databases. Some sort of mechanism to close these things down after lack of use seems like a good idea. The patch to do this would be pretty trivial. All you have to do is add a Timeout to each reply/noreply response in the couch_index gen_server, and then have a handle_info(timeout) clause that returns {stop, normal, State} . Other than that, I've heard other people wanting the update notifications for design docs so that could go in regardless. Another thing that we might want to think about is best practices for commits and whether commits should span applications. I don't have super strong feelings either way though I have heard the argument that not spanning applications decreases the chance that code starts getting super entangled. Good work, Filipe
          Hide
          Filipe Manana added a comment -

          Thanks Paul.

          Yes, that point was not addressed as I wasn't seeing what your idea was. You were supposed to explain it, according to the previous comment, but you forgot to do it until now.

          About the shutdown after an inactivity period, I considered that some time ago (I think we even discussed that on irc). However that's a separate issue.

          Updated patch master-0002-Shutdown-view-group-on-ddoc-update.patch

          Show
          Filipe Manana added a comment - Thanks Paul. Yes, that point was not addressed as I wasn't seeing what your idea was. You were supposed to explain it, according to the previous comment, but you forgot to do it until now. About the shutdown after an inactivity period, I considered that some time ago (I think we even discussed that on irc). However that's a separate issue. Updated patch master-0002-Shutdown-view-group-on-ddoc-update.patch
          Hide
          Filipe Manana added a comment -

          Applied to master and branch 1.2.x

          Show
          Filipe Manana added a comment - Applied to master and branch 1.2.x
          Hide
          Filipe Manana added a comment -

          Found a race condition that can happen and results in a case_clause error inside the view manager (couch_view / couch_index_server (master)).

          Patch attached, 1.2.0 blocker.

          Show
          Filipe Manana added a comment - Found a race condition that can happen and results in a case_clause error inside the view manager (couch_view / couch_index_server (master)). Patch attached, 1.2.0 blocker.
          Hide
          Filipe Manana added a comment -

          Fix applied to 1.2.x and master.

          Show
          Filipe Manana added a comment - Fix applied to 1.2.x and master.

            People

            • Assignee:
              Filipe Manana
              Reporter:
              Filipe Manana
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development