CouchDB
  1. CouchDB
  2. COUCHDB-1341

calculate replication id using only db name in remote case

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      currently if the source or target in a replication spec contains user/pwd information it gets encoded in the id which can cause restarts if the password changes. Change it to use just the db name as the local case does, Here's a draft[1] of a solution.

      [1] https://github.com/bdionne/couchdb/compare/master...9767-fix-repl-id

        Activity

        Hide
        Filipe Manana added a comment -

        Hi Bob, that's definitely an issue.

        However I think that are 2 issues by using this approach of considering only the remote db name and excluding the server/port component of the URL.

        Imagine that on a CouchDB instance you trigger these 2 replications:

        (replication 1)

        { "source": "http://server1.com/foo", "target": "bar" }

        (replication 2)

        { "source": "http://server2.com/foo", "target": "bar" }

        From what I understand, both will have the same replication ID with this patch, right?
        If so you can't have both replications running in parallel, one of them will have conflict error when updating its checkpoint local doc (because it's the same for both).

        Also, if you start replication 1, followed by replication 2 and then followed by replication 1 again, we can lose the benefits of the checkpointing.
        Suppose you start replication 1, after it finishes the checkpoint document's most recent history entry has source sequence 1 000 000.
        Then you start replication 2. Because the ID is the same as replication 1, it will overwrite the checkpoint document. If it checkpoints more than 50 times (the maximum history length), all checkpoint entries from replication 1 are gone. When it finishes, if you start replication 1 again, it will no longer find entries in the checkpoint history related to it, so the replication will start from sequence 0 instead of 1 000 000.

        Basically, if we have a source or target like "http://user:password@server.com/dbname", I think we should consider everything from the URL except the password (eventually the protocol as well).

        Show
        Filipe Manana added a comment - Hi Bob, that's definitely an issue. However I think that are 2 issues by using this approach of considering only the remote db name and excluding the server/port component of the URL. Imagine that on a CouchDB instance you trigger these 2 replications: (replication 1) { "source": "http://server1.com/foo", "target": "bar" } (replication 2) { "source": "http://server2.com/foo", "target": "bar" } From what I understand, both will have the same replication ID with this patch, right? If so you can't have both replications running in parallel, one of them will have conflict error when updating its checkpoint local doc (because it's the same for both). Also, if you start replication 1, followed by replication 2 and then followed by replication 1 again, we can lose the benefits of the checkpointing. Suppose you start replication 1, after it finishes the checkpoint document's most recent history entry has source sequence 1 000 000. Then you start replication 2. Because the ID is the same as replication 1, it will overwrite the checkpoint document. If it checkpoints more than 50 times (the maximum history length), all checkpoint entries from replication 1 are gone. When it finishes, if you start replication 1 again, it will no longer find entries in the checkpoint history related to it, so the replication will start from sequence 0 instead of 1 000 000. Basically, if we have a source or target like "http://user:password@server.com/dbname", I think we should consider everything from the URL except the password (eventually the protocol as well).
        Hide
        Bob Dionne added a comment -

        Filipe,

        I think that makes sense. Originally I had done just that, used couch_util:url_strip_password to remove just the password, but somehow was convinced I could take out more. I'll rework it. Thanks for taking a look,

        Bob

        Show
        Bob Dionne added a comment - Filipe, I think that makes sense. Originally I had done just that, used couch_util:url_strip_password to remove just the password, but somehow was convinced I could take out more. I'll rework it. Thanks for taking a look, Bob
        Hide
        Adam Kocoloski added a comment -

        It's a bit more of a change, but the replicator could log in to the remote server(s) via a POST to /_session and fold the user_ctx into the replication ID to replace the user:password. Seems like that would make for symmetrical treatment of local and remote server credentials.

        Show
        Adam Kocoloski added a comment - It's a bit more of a change, but the replicator could log in to the remote server(s) via a POST to /_session and fold the user_ctx into the replication ID to replace the user:password. Seems like that would make for symmetrical treatment of local and remote server credentials.
        Hide
        Filipe Manana added a comment -

        Adam, I'm not very keen on using _session.
        If between 2 consecutive requests made by the replicator the cookie expires, then we need extra code to POST to session again and retry the 2nd request, making it all more complex then it already is. This is much more likely to happen when retrying a request (due to the sort of exponential retry wait period), or during continuous replications for longs periods where both source and target are in sync (no new requests done to the source).

        Show
        Filipe Manana added a comment - Adam, I'm not very keen on using _session. If between 2 consecutive requests made by the replicator the cookie expires, then we need extra code to POST to session again and retry the 2nd request, making it all more complex then it already is. This is much more likely to happen when retrying a request (due to the sort of exponential retry wait period), or during continuous replications for longs periods where both source and target are in sync (no new requests done to the source).
        Hide
        Robert Newson added a comment -

        The original approach, removing the password, seems the sanest move here. We do want to allow multiple remote replications that only vary by hostname, port and user.

        Show
        Robert Newson added a comment - The original approach, removing the password, seems the sanest move here. We do want to allow multiple remote replications that only vary by hostname, port and user.
        Hide
        Adam Kocoloski added a comment -

        Hi Filipe, I wasn't proposing that the replicator use the AuthSession cookie in making any subsequent requests. I agree that's asking for trouble. I just wanted a `whoami` on the remote server instead of relying on the username in the URL. It also seems like it would play more nicely with the advanced case where the user adds an Authorization header to the replication request instead of going for the proto://user:pass@host:port syntax.

        Show
        Adam Kocoloski added a comment - Hi Filipe, I wasn't proposing that the replicator use the AuthSession cookie in making any subsequent requests. I agree that's asking for trouble. I just wanted a `whoami` on the remote server instead of relying on the username in the URL. It also seems like it would play more nicely with the advanced case where the user adds an Authorization header to the replication request instead of going for the proto://user:pass@host:port syntax.

          People

          • Assignee:
            Bob Dionne
            Reporter:
            Bob Dionne
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development