CouchDB
  1. CouchDB
  2. COUCHDB-1004

list_to_existing_atom is too restrictive as used by couch_rep

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.3, 1.1
    • Component/s: Replication
    • Labels:
      None
    • Environment:

      erlang

    • Skill Level:
      New Contributors Level (Easy)

      Description

      We'd like to additional information to db_info in BigCouch, such as the Q and N constants for a given database. This causes replication to fail when replicating from BigCouch to CouchDB due to the use of list_to_existing_atom in couch_rep:dbinfo(...

      The claim is that list_to_atom pollutes the atoms table, however superficial testing indicates this is not the case, list_to_atom when called repeatedly seems to work fine. If this is true then consider reverting list_to_existing_atom back to list_to_atom.

        Activity

        Hide
        Filipe Manana added a comment -

        I agree. In some cases it's not really an issue.
        I would vote to replace our to_existing_atom/1 function in couch_util with something like:

        %% @spec to_atom(term(), allow_new())
        %% @type allow_new() = boolean()
        to_atom(V, false) ->
        list_to_existing_atom(to_list(V));
        to_atom(V, true) ->
        V1 = to_list(V),
        try
        list_to_existing_atom(V1)
        catch : ->
        list_to_atom(V1)
        end.

        Show
        Filipe Manana added a comment - I agree. In some cases it's not really an issue. I would vote to replace our to_existing_atom/1 function in couch_util with something like: %% @spec to_atom(term(), allow_new()) %% @type allow_new() = boolean() to_atom(V, false) -> list_to_existing_atom(to_list(V)); to_atom(V, true) -> V1 = to_list(V), try list_to_existing_atom(V1) catch : -> list_to_atom(V1) end.
        Hide
        Bob Dionne added a comment -

        Actually I'm not sure we'd want this, if in fact there's no problem with list_to_existing_atom (see email from Klaus about COUCHDB-829)

        I looked at the existing couch_util:to_existing_atom as well as all the places it's used. The existing function returns the string if it doesn't match an existing atom. That seems ok in most of the uses, but .eg. in couch_rep_writer it's not clear that it works correctly

        ~/emacs/couchdb:master$ grep couch_util:to_existing_atom src//.erl
        src/couchdb/couch_httpd.erl: Meth -> couch_util:to_existing_atom(Meth)
        src/couchdb/couch_httpd.erl: 'POST' -> couch_util:to_existing_atom(MethodOverride);
        src/couchdb/couch_httpd.erl: couch_util:to_existing_atom(MochiReq:get(method)),
        src/couchdb/couch_os_process.erl: throw({error, {couch_util:to_existing_atom(Id),Reason}});
        src/couchdb/couch_os_process.erl: throw(

        {couch_util:to_existing_atom(Id),Reason}

        );
        src/couchdb/couch_rep_writer.erl: ErrId = couch_util:to_existing_atom(Error),
        src/couchdb/couch_rep_writer.erl: ErrId = couch_util:to_existing_atom(couch_util:get_value(<<"error">>, Props)),
        src/couchdb/couch_uuids.erl: case couch_util:to_existing_atom(AlgoStr) of

        Show
        Bob Dionne added a comment - Actually I'm not sure we'd want this, if in fact there's no problem with list_to_existing_atom (see email from Klaus about COUCHDB-829 ) I looked at the existing couch_util:to_existing_atom as well as all the places it's used. The existing function returns the string if it doesn't match an existing atom. That seems ok in most of the uses, but .eg. in couch_rep_writer it's not clear that it works correctly ~/emacs/couchdb:master$ grep couch_util:to_existing_atom src/ / .erl src/couchdb/couch_httpd.erl: Meth -> couch_util:to_existing_atom(Meth) src/couchdb/couch_httpd.erl: 'POST' -> couch_util:to_existing_atom(MethodOverride); src/couchdb/couch_httpd.erl: couch_util:to_existing_atom(MochiReq:get(method)), src/couchdb/couch_os_process.erl: throw({error, {couch_util:to_existing_atom(Id),Reason}}); src/couchdb/couch_os_process.erl: throw( {couch_util:to_existing_atom(Id),Reason} ); src/couchdb/couch_rep_writer.erl: ErrId = couch_util:to_existing_atom(Error), src/couchdb/couch_rep_writer.erl: ErrId = couch_util:to_existing_atom(couch_util:get_value(<<"error">>, Props)), src/couchdb/couch_uuids.erl: case couch_util:to_existing_atom(AlgoStr) of
        Hide
        Randall Leeds added a comment -

        I just thought of one alternative.
        BigCouch could send an extra header with the dbinfo request. Couch would ignore this header while BigCouch would use it to decide whether to send the extra info.

        Just a thought.

        Show
        Randall Leeds added a comment - I just thought of one alternative. BigCouch could send an extra header with the dbinfo request. Couch would ignore this header while BigCouch would use it to decide whether to send the extra info. Just a thought.
        Hide
        Adam Kocoloski added a comment - - edited

        A header is certainly doable from the BigCouch perspective. But in general I don't like the fact that we've painted ourselves into an API corner where we cannot extend db.info() in the future without breaking replication.

        I think couch_util:to_existing_atom/1 is a fine replacement in this case. If it's an unknown atom we have a guarantee that the replicator is not going to be looking for that particular field, so we might as well just leave it as a binary.

        Show
        Adam Kocoloski added a comment - - edited A header is certainly doable from the BigCouch perspective. But in general I don't like the fact that we've painted ourselves into an API corner where we cannot extend db.info() in the future without breaking replication. I think couch_util:to_existing_atom/1 is a fine replacement in this case. If it's an unknown atom we have a guarantee that the replicator is not going to be looking for that particular field, so we might as well just leave it as a binary.
        Hide
        Randall Leeds added a comment -

        Adam, if I follow you right you're saying catch the badarg on list_to_existing_atom/1 and leave it as binray. Sounds like a good plan to me.

        Show
        Randall Leeds added a comment - Adam, if I follow you right you're saying catch the badarg on list_to_existing_atom/1 and leave it as binray. Sounds like a good plan to me.
        Hide
        Adam Kocoloski added a comment -

        Yep, and that's exactly how couch_util:to_existing_atom/1 works (sorry, my initial comment just said to_existing_atom which was stupidly vague).

        Show
        Adam Kocoloski added a comment - Yep, and that's exactly how couch_util:to_existing_atom/1 works (sorry, my initial comment just said to_existing_atom which was stupidly vague).
        Hide
        Filipe Manana added a comment -

        I now remember I had the same issue with the new replicator code. I opted for using always binaries:

        https://github.com/fdmanana/couchdb/blob/trunk_new_replicator/src/couchdb/couch_api_wrap.erl#L110

        Like this, all the code that uses the output of that function always does something like:

        couch_util:get_value(<<"key">>, List)

        Using couch_util:to_existing_atom/1, means that further code doesn't know what type of key to use: an atom or a binary. The solution would be to do something like:

        couch_util:get_value(mykey, List, couch_util:get_value(<<"mykey">>, List))

        But that's a bit too much typing imho

        Show
        Filipe Manana added a comment - I now remember I had the same issue with the new replicator code. I opted for using always binaries: https://github.com/fdmanana/couchdb/blob/trunk_new_replicator/src/couchdb/couch_api_wrap.erl#L110 Like this, all the code that uses the output of that function always does something like: couch_util:get_value(<<"key">>, List) Using couch_util:to_existing_atom/1, means that further code doesn't know what type of key to use: an atom or a binary. The solution would be to do something like: couch_util:get_value(mykey, List, couch_util:get_value(<<"mykey">>, List)) But that's a bit too much typing imho
        Hide
        Filipe Manana added a comment -

        Here's the proposed solution

        Show
        Filipe Manana added a comment - Here's the proposed solution
        Hide
        Bob Dionne added a comment -

        Thanks Filipe,

        this solves our bigcouch issue and I'd be +1 on having it included in 1.0.2,

        Bob

        Show
        Bob Dionne added a comment - Thanks Filipe, this solves our bigcouch issue and I'd be +1 on having it included in 1.0.2, Bob
        Hide
        Adam Kocoloski added a comment - - edited

        The proposed solution is fine, but I think you're overcomplicating matters with the code snippet

        couch_util:get_value(mykey, List, couch_util:get_value(<<"mykey">>, List))

        In that example the 'mykey' atom is already in the module, and the module is already loaded, so you know that couch_util:to_existing_atom/1 will give you an atom. There's no need to check for the binary version.

        Show
        Adam Kocoloski added a comment - - edited The proposed solution is fine, but I think you're overcomplicating matters with the code snippet couch_util:get_value(mykey, List, couch_util:get_value(<<"mykey">>, List)) In that example the 'mykey' atom is already in the module, and the module is already loaded, so you know that couch_util:to_existing_atom/1 will give you an atom. There's no need to check for the binary version.
        Hide
        Filipe Manana added a comment -

        Adam, that code snippet is not used by the proposed solution

        Show
        Filipe Manana added a comment - Adam, that code snippet is not used by the proposed solution
        Hide
        Adam Kocoloski added a comment -

        Yep, I know. What I meant was that an alternative solution which left everything as atoms and used couch_util:to_existing_atom/1 would not need to do the double-get_value thing, because the lookup for the binary key would never succeed.

        Show
        Adam Kocoloski added a comment - Yep, I know. What I meant was that an alternative solution which left everything as atoms and used couch_util:to_existing_atom/1 would not need to do the double-get_value thing, because the lookup for the binary key would never succeed.
        Hide
        Filipe Manana added a comment -

        I see what you mean. When it didn't work for me it was maybe because the get_value(atom, ...) was being done on a module other than the module which did the calls to list_to_existing_atom/1. Long time ago and a different code/context.

        Show
        Filipe Manana added a comment - I see what you mean. When it didn't work for me it was maybe because the get_value(atom, ...) was being done on a module other than the module which did the calls to list_to_existing_atom/1. Long time ago and a different code/context.
        Hide
        Jan Lehnardt added a comment -

        Bump, should we get this into 1.0.3/1.1.0 now?

        Show
        Jan Lehnardt added a comment - Bump, should we get this into 1.0.3/1.1.0 now?
        Hide
        Bob Dionne added a comment -

        It would nice, and helpful downstream. I think Adam's proposed trivial change is the simplest, use couch_util:to_existing_atom/1

        Show
        Bob Dionne added a comment - It would nice, and helpful downstream. I think Adam's proposed trivial change is the simplest, use couch_util:to_existing_atom/1
        Hide
        Filipe Manana added a comment -

        Applied Adam's suggestion (use couch_util:to_existing_atom/1) to 1.0.x and 1.1.x.
        Trunk does not suffer from this issue.

        Show
        Filipe Manana added a comment - Applied Adam's suggestion (use couch_util:to_existing_atom/1) to 1.0.x and 1.1.x. Trunk does not suffer from this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Bob Dionne
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development