CouchDB
  1. CouchDB
  2. COUCHDB-1445

CouchDB deletes .view files if it can't open them, even if the error is "emfile".

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.1.1, 1.3
    • Fix Version/s: 1.2, 1.3
    • Component/s: JavaScript View Server
    • Labels:
      None

      Description

      Via Stefan Kögl on dev@:

      Another thing I noticed during my tests of CouchDB 1.2.x. I redirected
      live traffic to the instance and after a rather short time, requests
      were failing with the following information in the logs:

      [Sun, 18 Mar 2012 16:39:24 GMT] [error] [<0.27554.2>]
      {error_report,<0.31.0>,
      {<0.27554.2>,std_error,
      [

      {application,mochiweb}

      ,
      "Accept failed error",
      "

      {error,emfile}

      "]}}
      [Sun, 18 Mar 2012 16:39:24 GMT] [error] [<0.27554.2>]
      {error_report,<0.31.0>,
      {<0.27554.2>,crash_report,
      [[{initial_call,
      {mochiweb_acceptor,init,
      ['Argument_1','Argument_2',
      'Argument__3']}},

      {pid,<0.27554.2>}

      ,

      {registered_name,[]}

      ,
      {error_info,
      {exit,

      {error,accept_failed}

      ,
      [

      {mochiweb_acceptor,init,3}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}},

      {ancestors, [couch_httpd,couch_secondary_services, couch_server_sup,<0.32.0>]}

      ,

      {messages,[]}

      ,

      {links,[<0.129.0>]}

      ,

      {dictionary,[]}

      ,

      {trap_exit,false}

      ,

      {status,running}

      ,

      {heap_size,233}

      ,

      {stack_size,24}

      ,

      {reductions,244}

      ],
      []]}}

      I think "emfile" means that CouchDB (or mochiweb?) couldn't open any
      more files / connections. I've set the (hard and soft) nofile limit for
      user couchdb to 4096, but didn't raise the ERL_MAX_PORTS accordingly.
      Anyway, as soon as the error occured, CouchDB started writing most of my
      view files from scratch, rendering the instance unusable.

      I'd expect CouchDB to fail more gracefully when the maximum number of
      open files is reached. Is this a bug or expected behaviour?

        Activity

        Hide
        Jan Lehnardt added a comment -

        Randall Leeds replies:

        Looks like a bug. Whenever there's a problem opening a view file,
        couch_view tries to delete it. Clearly, this is not the right course of
        action when the problem is due to emfile.

        Here's a patch that I propose might fix it. I'd like to hear from another
        dev on this, or if there's a better way we should bail out.

        diff --git a/src/couchdb/couch_view_group.erl
        b/src/couchdb/couch_view_group.erl
        index 97fc512..ab075bd 100644
        — a/src/couchdb/couch_view_group.erl
        +++ b/src/couchdb/couch_view_group.erl
        @@ -469,6 +469,10 @@ open_index_file(RootDir, DbName, GroupSig) ->
        case couch_file:open(FileName) of

        {ok, Fd} -> {ok, Fd}

        ;

        {error, enoent}

        -> couch_file:open(FileName, [create]);
        +

        {error, emfile} ->
        + ?LOG_ERROR("Could not open file for view index: max open files
        reached. "
        + "Raise ERL_MAX_PORTS or system limits.", []),
        + throw({error, emfile}

        );
        Error -> Error
        end.

        Show
        Jan Lehnardt added a comment - Randall Leeds replies: Looks like a bug. Whenever there's a problem opening a view file, couch_view tries to delete it. Clearly, this is not the right course of action when the problem is due to emfile. Here's a patch that I propose might fix it. I'd like to hear from another dev on this, or if there's a better way we should bail out. diff --git a/src/couchdb/couch_view_group.erl b/src/couchdb/couch_view_group.erl index 97fc512..ab075bd 100644 — a/src/couchdb/couch_view_group.erl +++ b/src/couchdb/couch_view_group.erl @@ -469,6 +469,10 @@ open_index_file(RootDir, DbName, GroupSig) -> case couch_ file:open(FileName ) of {ok, Fd} -> {ok, Fd} ; {error, enoent} -> couch_ file:open(FileName , [create] ); + {error, emfile} -> + ?LOG_ERROR("Could not open file for view index: max open files reached. " + "Raise ERL_MAX_PORTS or system limits.", []), + throw({error, emfile} ); Error -> Error end.
        Hide
        Jan Lehnardt added a comment -

        Git blame leads to:

        commit 56a3ee28e006aa42150482e1c3f91dc1906273f9
        Author: Damien F. Katz <damien@apache.org>
        Date: Fri Dec 12 05:23:37 2008 +0000

        modifications to view server to keep the file descriptor open for the life of the view group.

        Show
        Jan Lehnardt added a comment - Git blame leads to: commit 56a3ee28e006aa42150482e1c3f91dc1906273f9 Author: Damien F. Katz <damien@apache.org> Date: Fri Dec 12 05:23:37 2008 +0000 modifications to view server to keep the file descriptor open for the life of the view group.
        Hide
        Jan Lehnardt added a comment -

        @Randall & others: what other conditions like emfile can happen where creating the file from scratch is not the right cause of action?

        Show
        Jan Lehnardt added a comment - @Randall & others: what other conditions like emfile can happen where creating the file from scratch is not the right cause of action?
        Hide
        Robert Newson added a comment -

        let's backport to 1.1.x just in case.

        Show
        Robert Newson added a comment - let's backport to 1.1.x just in case.
        Hide
        Robert Newson added a comment -

        To summarize couchdb-dev IRC chat, we (Randall, Jan, myself) don't think there's any error condition when opening a view file that warrants blowing the view file away. The behaviour dates to 2008 where, perhaps, it was considered better to automatically recover from any error in a view file by starting over. CouchDB's view handling is well-tested now and so this pessimistic recovery mechanism is overly aggressive.

        Instead, we will log an error to the log file and throw in the code.

        Show
        Robert Newson added a comment - To summarize couchdb-dev IRC chat, we (Randall, Jan, myself) don't think there's any error condition when opening a view file that warrants blowing the view file away. The behaviour dates to 2008 where, perhaps, it was considered better to automatically recover from any error in a view file by starting over. CouchDB's view handling is well-tested now and so this pessimistic recovery mechanism is overly aggressive. Instead, we will log an error to the log file and throw in the code.
        Hide
        Paul Joseph Davis added a comment -

        Yep. This is a pretty good explanation and solution. This behavior has been there forever. I always understood the intent to be what this ticket decided (rewriting views is fine, so just trying to blow them away and rewriting to solve errors is acceptable). I agree that we can switch this to a log and abort the open. The exact implementation of which can be in a couple places but I'm not overly tied to any particular one.

        One caveat, I'm pretty sure there's a function that works out to be the equivalent of strerror. We should make use of that rather than trying to generate our own posix error messages.

        Show
        Paul Joseph Davis added a comment - Yep. This is a pretty good explanation and solution. This behavior has been there forever. I always understood the intent to be what this ticket decided (rewriting views is fine, so just trying to blow them away and rewriting to solve errors is acceptable). I agree that we can switch this to a log and abort the open. The exact implementation of which can be in a couple places but I'm not overly tied to any particular one. One caveat, I'm pretty sure there's a function that works out to be the equivalent of strerror. We should make use of that rather than trying to generate our own posix error messages.
        Hide
        Paul Joseph Davis added a comment -

        Wow. Took me a while to find it, but its (appropriately named) erl_posix_msg:message/1 which takes the reason from the posix tuple.

        Show
        Paul Joseph Davis added a comment - Wow. Took me a while to find it, but its (appropriately named) erl_posix_msg:message/1 which takes the reason from the posix tuple.
        Hide
        Paul Joseph Davis added a comment -

        And then Randall reminds me its actually file:format_error. Note to future selves, don't bother trying to grep for strerror or perror in Erlang sources.

        Show
        Paul Joseph Davis added a comment - And then Randall reminds me its actually file:format_error . Note to future selves, don't bother trying to grep for strerror or perror in Erlang sources.
        Hide
        Randall Leeds added a comment -

        Fixed on 1.1.x and 1.2.x. Blocking master for 1.3.

        To https://git-wip-us.apache.org/repos/asf/couchdb.git
        29eac04..94e72e7 1.2.x -> 1.2.x
        785d32f..24a61fd 1.1.x -> 1.1.x

        Show
        Randall Leeds added a comment - Fixed on 1.1.x and 1.2.x. Blocking master for 1.3. To https://git-wip-us.apache.org/repos/asf/couchdb.git 29eac04..94e72e7 1.2.x -> 1.2.x 785d32f..24a61fd 1.1.x -> 1.1.x
        Hide
        Benoit Chesneau added a comment -

        That's here where I really like diff, instead having to go in another branch to find useful changes.

        How do we proceed to add these changes to the HEAD ? Using 1.2 or 1.1 history ? 1 commit ?

        Show
        Benoit Chesneau added a comment - That's here where I really like diff, instead having to go in another branch to find useful changes. How do we proceed to add these changes to the HEAD ? Using 1.2 or 1.1 history ? 1 commit ?
        Hide
        Paul Joseph Davis added a comment -

        It'd probably have to be a manual merge of the diff from 1.2 since the view engine is so different.

        Show
        Paul Joseph Davis added a comment - It'd probably have to be a manual merge of the diff from 1.2 since the view engine is so different.
        Hide
        Benoit Chesneau added a comment -

        ok, i'm on it , will provide the patch this afternoon.

        Show
        Benoit Chesneau added a comment - ok, i'm on it , will provide the patch this afternoon.
        Hide
        Robert Newson added a comment -

        Not fixed on master. I am trying to fix that but haven't succeeded yet (porting the test is fiddly).

        Show
        Robert Newson added a comment - Not fixed on master. I am trying to fix that but haven't succeeded yet (porting the test is fiddly).
        Hide
        Randall Leeds added a comment -

        This was fixed before the 1.2 release. It's open still because it needs a forward-port, which I've committed myself to doing.

        Show
        Randall Leeds added a comment - This was fixed before the 1.2 release. It's open still because it needs a forward-port, which I've committed myself to doing.
        Hide
        Robert Newson added a comment -

        Removing 1.0.4/1.1.2.

        Show
        Robert Newson added a comment - Removing 1.0.4/1.1.2.
        Hide
        Robert Newson added a comment -

        Randall resolved this a few days ago but forgot to close the ticket like the naughty person he is.

        Show
        Robert Newson added a comment - Randall resolved this a few days ago but forgot to close the ticket like the naughty person he is.

          People

          • Assignee:
            Randall Leeds
            Reporter:
            Jan Lehnardt
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development