Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
    • Environment:

      Ubuntu, 9.10

    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      So we had to restart replication on a server and here's something I noticed.

      At first I restarted the replication via the following command from localhost:

      curl -X POST -d '

      {"source":"http://localhost:5984/foo", "target":"http://remote:5984/foo"}

      ' http://localhost:5984/_replicate

      In response, futon stats:
      W Processed source update #176841152

      That part is great.

      Last night I did not have immediate access to the shell so I restarted replication from remote (through curl on my mobile):

      curl -X POST -d '

      {"source":"http://user:pass@public.host:5984/foo", "target":"http://remote:5984/foo"}

      ' http://user:pass@pubic.host:5984/_replicate

      The response in futon this morning:
      W Processed source update #1066

      ... and it kept sitting there like it was stalled and only continued in smaller increments.

      I restarted CouchDB and restarted from localhost - instant jump to 176 million.

      I'm just wondering what might be different accept for that one is against the public interface, vs. localhost. I'd assume that replication behaves the same regardless.

        Issue Links

          Activity

          Till Klampaeckel created issue -
          Hide
          Till Klampaeckel added a comment -

          Spelling!

          Show
          Till Klampaeckel added a comment - Spelling!
          Till Klampaeckel made changes -
          Field Original Value New Value
          Description So we had to restart replication on a server and here's something I noticed.

          At first I restarted the replication via the following command from localhost:

          curl -X POST -d '{"source":"http://localhost:5984/foo", "target":"http://remote:5984/foo"}' http://localhost:5984/_replicate

          In response, futon stats:
          W Processed source update #176841152

          That part is great.

          Last night I did not have immediate access to the shell so I restarted replication from remote:

          curl -X POST -d '{"source":"http://user:pass@public.host:5984/foo", "target":"http://remote:5984/foo"}' http://user:pass@pubic.host:5984/_replicate

          The response in futon this morning:
          W Processed source update #1066

          ... and it kept sitting there like it was stalled and only continue in smaller increments.

          I restarted CouchDB and restarted from localhost - instant jump to 176 million.

          I'm just wondering what might be different accept for that one is against the public interface, vs. localhost. I'd assume that replication behaves the same regardless.
          So we had to restart replication on a server and here's something I noticed.

          At first I restarted the replication via the following command from localhost:

          curl -X POST -d '{"source":"http://localhost:5984/foo", "target":"http://remote:5984/foo"}' http://localhost:5984/_replicate

          In response, futon stats:
          W Processed source update #176841152

          That part is great.

          Last night I did not have immediate access to the shell so I restarted replication from remote (through curl on my mobile):

          curl -X POST -d '{"source":"http://user:pass@public.host:5984/foo", "target":"http://remote:5984/foo"}' http://user:pass@pubic.host:5984/_replicate

          The response in futon this morning:
          W Processed source update #1066

          ... and it kept sitting there like it was stalled and only continued in smaller increments.

          I restarted CouchDB and restarted from localhost - instant jump to 176 million.

          I'm just wondering what might be different accept for that one is against the public interface, vs. localhost. I'd assume that replication behaves the same regardless.
          Hide
          Randall Leeds added a comment -

          I've run into this problem, too. Couch uses the URI of the source and destination to determine the id for the document which stores replication checkpoints. A while back I proposed adding a uuid to each database that does not get replicated and could uniquely identify each replica. No one seemed to think this was important enough, but I'd be happy to do the work.

          Show
          Randall Leeds added a comment - I've run into this problem, too. Couch uses the URI of the source and destination to determine the id for the document which stores replication checkpoints. A while back I proposed adding a uuid to each database that does not get replicated and could uniquely identify each replica. No one seemed to think this was important enough, but I'd be happy to do the work.
          Hide
          Till Klampaeckel added a comment -

          I guess I agree that it's important.

          This is probably not an issue for installs on a smaller scale but gets annoying when you're in for "moar".

          Show
          Till Klampaeckel added a comment - I guess I agree that it's important. This is probably not an issue for installs on a smaller scale but gets annoying when you're in for "moar".
          Randall Leeds made changes -
          Link This issue depends on COUCHDB-477 [ COUCHDB-477 ]
          Hide
          Till Klampaeckel added a comment -

          Until I find time for the wiki:

          The mechanism does not just depend on the host per se, all the following URIs are different to CouchDB even though they point to the same system:

          http://foo:bar@localhost:5984/citations
          http://localhost:5984/citations

          @Randall
          While you're at it doing UUIDs for databases, do you think it would be possible to do a UUID for a server as well? It could be exposed with the version. It would be very useful in large scale setups for monitoring etc.. E.g. to easily tell shards apart, etc.. Vs. building external "logic" that revolves around hostnames, port numbers or different something.

          Show
          Till Klampaeckel added a comment - Until I find time for the wiki: The mechanism does not just depend on the host per se, all the following URIs are different to CouchDB even though they point to the same system: http://foo:bar@localhost:5984/citations http://localhost:5984/citations @Randall While you're at it doing UUIDs for databases, do you think it would be possible to do a UUID for a server as well? It could be exposed with the version. It would be very useful in large scale setups for monitoring etc.. E.g. to easily tell shards apart, etc.. Vs. building external "logic" that revolves around hostnames, port numbers or different something.
          Hide
          Damien Katz added a comment -

          Per database UUIDs have the problem of databases being copied around on the file system, or restored from backup. A better option is convert the URIs to a canonical format so they always look the same.

          Show
          Damien Katz added a comment - Per database UUIDs have the problem of databases being copied around on the file system, or restored from backup. A better option is convert the URIs to a canonical format so they always look the same.
          Hide
          Randall Leeds added a comment -

          Damien, I originally reasoned that this was a good thing because the identity of a database would follow the database itself. I now realize that while this is great for moving dbs it's a disaster for copying them because you create two (possibly diverging) instances which claim to be the same.

          Regarding a canonical format, how does this simple idea sound:
          Generate a random uuid on the #db record, but do not write it out in the header. This way, dbs have a new uuid every time they're opened. Then, the replicator can try to open the bare db name at the end of the URI locally and compare that to the uuid returned from the http info. Then, local dbs can be detected and canonicalized to the bare db name without relying on hostname information that is sensitive to system/dns configuration problems.

          As a side bonus, replicating a database to itself could always be detected at the outset, which reduces the checkpoint conflict scenarios to only ones in which a previous checkpoint save succeeded but the client timed out waiting for the reply, at which point replication should safely restart gracefully. I'm happy to put this together later.

          Show
          Randall Leeds added a comment - Damien, I originally reasoned that this was a good thing because the identity of a database would follow the database itself. I now realize that while this is great for moving dbs it's a disaster for copying them because you create two (possibly diverging) instances which claim to be the same. Regarding a canonical format, how does this simple idea sound: Generate a random uuid on the #db record, but do not write it out in the header. This way, dbs have a new uuid every time they're opened. Then, the replicator can try to open the bare db name at the end of the URI locally and compare that to the uuid returned from the http info. Then, local dbs can be detected and canonicalized to the bare db name without relying on hostname information that is sensitive to system/dns configuration problems. As a side bonus, replicating a database to itself could always be detected at the outset, which reduces the checkpoint conflict scenarios to only ones in which a previous checkpoint save succeeded but the client timed out waiting for the reply, at which point replication should safely restart gracefully. I'm happy to put this together later.
          Hide
          Randall Leeds added a comment -

          As far as a canonical format for remote databases:
          1) strip the protocol so http and https aren't different
          2) strip any login info
          3) strip trailing slashes
          4) uri-decode the result (I say decode only because I'm relatively sure it's idempotent, but I don't know if encoding is. In particular, is % safe in a uri?)

          Care would have to be taken to check for and migrate old checkpoint histories.

          Show
          Randall Leeds added a comment - As far as a canonical format for remote databases: 1) strip the protocol so http and https aren't different 2) strip any login info 3) strip trailing slashes 4) uri-decode the result (I say decode only because I'm relatively sure it's idempotent, but I don't know if encoding is. In particular, is % safe in a uri?) Care would have to be taken to check for and migrate old checkpoint histories.
          Randall Leeds made changes -
          Link This issue blocks COUCHDB-477 [ COUCHDB-477 ]
          Randall Leeds made changes -
          Link This issue depends on COUCHDB-477 [ COUCHDB-477 ]
          Paul Joseph Davis made changes -
          Skill Level Regular Contributors Level (Easy to Medium)

            People

            • Assignee:
              Unassigned
              Reporter:
              Till Klampaeckel
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development