Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-2980

Replicator DB on 15984 replicates to backdoor ports

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      If you POST a doc into the replicator database a replication is kicked off and finishes successfully (usual 5984 port which maps to 15984 via haproxy).

      The problem is that the DB is replicated to the backdoor ports (15986) and is not visible on the other ports.

        Issue Links

          Activity

          Hide
          kxepal Alexander Shorin added a comment -

          What kind of POST that should be?

          Show
          kxepal Alexander Shorin added a comment - What kind of POST that should be?
          Hide
          robertkowalski Robert Kowalski added a comment - - edited

          POST to $DB/replicator with a simple replication to kick of a replication:

          {
          "_id": "my_rep2",
          "source": "http://localhost:5984/animaldb",
          "target": "copyanimaldbrepbug",
          "continuous": false,
          "create_target": true,
          "user_ctx":

          { "name": "YOU", "roles": [ "_admin" ] }

          }

          Show
          robertkowalski Robert Kowalski added a comment - - edited POST to $DB/replicator with a simple replication to kick of a replication: { "_id": "my_rep2", "source": "http://localhost:5984/animaldb", "target": "copyanimaldbrepbug", "continuous": false, "create_target": true, "user_ctx": { "name": "YOU", "roles": [ "_admin" ] } }
          Hide
          rnewson Robert Newson added a comment -

          so we have a fun called possibly_hack that fixes up local source/target for _replicate calls but not the equivalent for _replicator docs.

          I'm inclined to prohibit "local" source/target in _replicator docs by tweaking the validate_doc_update function.

          To fix this naturally would involve making _replicator document behave differently based on whether it came from a clustered or non-clustered database, which might be tricky.

          Show
          rnewson Robert Newson added a comment - so we have a fun called possibly_hack that fixes up local source/target for _replicate calls but not the equivalent for _replicator docs. I'm inclined to prohibit "local" source/target in _replicator docs by tweaking the validate_doc_update function. To fix this naturally would involve making _replicator document behave differently based on whether it came from a clustered or non-clustered database, which might be tricky.
          Hide
          robertkowalski Robert Kowalski added a comment -

          other example:

          <code>
          {
          "_id": "my_rep2",
          "_rev": "3-b66e82cfb790a314ee5f9278860a00a9",
          "source": "http://rockoartischocko:MYPASSWORD@rockoartischocko.cloudant.com/animaldb",
          "target": "animaldbfromcloudant",
          "continuous": false,
          "create_target": true,
          "user_ctx":

          { "name": "YOU", "roles": [ "_admin" ] }

          ,
          "owner": null,
          "_replication_state": "completed",
          "_replication_state_time": "2016-06-09T14:21:49+02:00",
          "_replication_id": "25221306efcdeb84fdd16e7bcbe9438b",
          "_replication_stats":

          { "revisions_checked": 16, "missing_revisions_found": 16, "docs_read": 16, "docs_written": 16, "changes_pending": null, "doc_write_failures": 0, "checkpointed_source_seq": "18-g1AAAAGjeJzLYWBgYMlgTmGQT0lKzi9KdUhJMtPLSs1LLUst0kvOyS9NScwr0ctLLckBKmRKZEiy____f1YGcyJ3LlCAPck4zdggKY2wdlQrTHBbkeQAJJPqobYwgW0xNUszsrQwJWwC0R7JYwGSDA1ACmjRfoRN5oZJyUnmBqT6x4KQTQcgNoH9xAy2yczCOMUyOY2wKVkAq-mIRA" }

          }
          </code>

          Show
          robertkowalski Robert Kowalski added a comment - other example: <code> { "_id": "my_rep2", "_rev": "3-b66e82cfb790a314ee5f9278860a00a9", "source": "http://rockoartischocko:MYPASSWORD@rockoartischocko.cloudant.com/animaldb", "target": "animaldbfromcloudant", "continuous": false, "create_target": true, "user_ctx": { "name": "YOU", "roles": [ "_admin" ] } , "owner": null, "_replication_state": "completed", "_replication_state_time": "2016-06-09T14:21:49+02:00", "_replication_id": "25221306efcdeb84fdd16e7bcbe9438b", "_replication_stats": { "revisions_checked": 16, "missing_revisions_found": 16, "docs_read": 16, "docs_written": 16, "changes_pending": null, "doc_write_failures": 0, "checkpointed_source_seq": "18-g1AAAAGjeJzLYWBgYMlgTmGQT0lKzi9KdUhJMtPLSs1LLUst0kvOyS9NScwr0ctLLckBKmRKZEiy____f1YGcyJ3LlCAPck4zdggKY2wdlQrTHBbkeQAJJPqobYwgW0xNUszsrQwJWwC0R7JYwGSDA1ACmjRfoRN5oZJyUnmBqT6x4KQTQcgNoH9xAy2yczCOMUyOY2wKVkAq-mIRA" } } </code>
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user rnewson opened a pull request:

          https://github.com/apache/couchdb-couch-replicator/pull/41

          ban local endpoints

          Using "local" names in source and target yields unexpected behaviour (creating unsharded dbs which are also, by default, unreachable). This patch insists that "source" and "target" are http or https URL's.

          COUCHDB-2980

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/cloudant/couchdb-couch-replicator 2980-ban-local-endpoints

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch-replicator/pull/41.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #41


          commit 12a5e2ac0d47d942327133a996e6065292c4f213
          Author: Robert Newson <rnewson@apache.org>
          Date: 2016-06-09T14:39:04Z

          Ensure _design/_replicator VDU is updated

          commit 69558f31f52c14ea8bee510e69111b0e00f85fe8
          Author: Robert Newson <rnewson@apache.org>
          Date: 2016-06-09T14:39:19Z

          Insist on http/https url's for source and target


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user rnewson opened a pull request: https://github.com/apache/couchdb-couch-replicator/pull/41 ban local endpoints Using "local" names in source and target yields unexpected behaviour (creating unsharded dbs which are also, by default, unreachable). This patch insists that "source" and "target" are http or https URL's. COUCHDB-2980 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloudant/couchdb-couch-replicator 2980-ban-local-endpoints Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch-replicator/pull/41.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #41 commit 12a5e2ac0d47d942327133a996e6065292c4f213 Author: Robert Newson <rnewson@apache.org> Date: 2016-06-09T14:39:04Z Ensure _design/_replicator VDU is updated commit 69558f31f52c14ea8bee510e69111b0e00f85fe8 Author: Robert Newson <rnewson@apache.org> Date: 2016-06-09T14:39:19Z Insist on http/https url's for source and target
          Hide
          rnewson Robert Newson added a comment -

          I don't think this can be fixed to match < 2.0 behaviour. A local source or target is being honoured correctly, it's just (probably) not what the user intended. It doesn't "replicate to backdoor ports", it's reading/writing directly, not using http.

          "foo" in the :5986/_replicator db works as expected and it's not entirely unreasonable that "foo" in the :5984/_replicator means exactly the same thing.

          I don't think it's appropriate to prohibit local source/target unless we will do so for the node-local :5986/_replicator database as well as the clustered :5984/_replicator database.

          The hack in chttpd.erl is actually quite bad. It uses http (not https, even if available) and uses the local nodes public IP address, so is not fault-tolerant.

          Still, the behaviour between _replicate and _replicator is inconsistent. This has been true in the bigcouch codebase since forever so it's arguably not release blocking, but now is the time to decide what behaviour we desire. To that end, these are all the options I think we can actually deliver in a short timeframe;

          1) remove fix_uri/possibly_hack from _replicate. This means "foo" always means a local db (and therefore unsharded and unreachable by default).

          2) prohibit local source/target in all cases (_replicate will return a 400 Bad Request and _replicator will reject a document update that tries to insert it).

          Show
          rnewson Robert Newson added a comment - I don't think this can be fixed to match < 2.0 behaviour. A local source or target is being honoured correctly, it's just (probably) not what the user intended. It doesn't "replicate to backdoor ports", it's reading/writing directly, not using http. "foo" in the :5986/_replicator db works as expected and it's not entirely unreasonable that "foo" in the :5984/_replicator means exactly the same thing. I don't think it's appropriate to prohibit local source/target unless we will do so for the node-local :5986/_replicator database as well as the clustered :5984/_replicator database. The hack in chttpd.erl is actually quite bad. It uses http (not https, even if available) and uses the local nodes public IP address, so is not fault-tolerant. Still, the behaviour between _replicate and _replicator is inconsistent. This has been true in the bigcouch codebase since forever so it's arguably not release blocking, but now is the time to decide what behaviour we desire. To that end, these are all the options I think we can actually deliver in a short timeframe; 1) remove fix_uri/possibly_hack from _replicate. This means "foo" always means a local db (and therefore unsharded and unreachable by default). 2) prohibit local source/target in all cases (_replicate will return a 400 Bad Request and _replicator will reject a document update that tries to insert it).
          Hide
          vatamane Nick Vatamaniuc added a comment -

          I like 2).

          If there is no obvious way to transform "local" sources/targets to remote counter-parts, I think we should remove "local" being accepted as a valid replication target or source.

          By default local replications in 2.0 would give users the ability to clobber or replicate out of "system" databases like dbs, nodes, _users which are local. We could explicitly disallow those dbs perhaps, but even then, it is not clear what are the use cases for "local" replications in general.

          Also, performing this transformation on _replicate endpoint replication but not on _replicator docs, is a bit misleading and confusing.

          Show
          vatamane Nick Vatamaniuc added a comment - I like 2). If there is no obvious way to transform "local" sources/targets to remote counter-parts, I think we should remove "local" being accepted as a valid replication target or source. By default local replications in 2.0 would give users the ability to clobber or replicate out of "system" databases like dbs, nodes, _users which are local. We could explicitly disallow those dbs perhaps, but even then, it is not clear what are the use cases for "local" replications in general. Also, performing this transformation on _replicate endpoint replication but not on _replicator docs, is a bit misleading and confusing.
          Hide
          chrisfosterelli Chris Foster added a comment - - edited

          I think we really need a way of not specifying the full URL. We have a series of inner-cluster replications that are continuous and persistent (in the `_replicator` table), and full URL's make that a big pain.

          If all of the replication tables are set up to use a full URL, then it becomes impossible to replicate a production database elsewhere. The destination database you replicate the `_replicator` table to immediately starts double replicating the production tables, not its own databases. Since the credentials are included there is no way to stop this without also stopping replication in the production cluster.

          Even if there was just a way to say "this cluster", that would be significantly more ideal than hardcoded full URL database strings that include the username and password.

          Show
          chrisfosterelli Chris Foster added a comment - - edited I think we really need a way of not specifying the full URL. We have a series of inner-cluster replications that are continuous and persistent (in the `_replicator` table), and full URL's make that a big pain. If all of the replication tables are set up to use a full URL, then it becomes impossible to replicate a production database elsewhere. The destination database you replicate the `_replicator` table to immediately starts double replicating the production tables, not its own databases. Since the credentials are included there is no way to stop this without also stopping replication in the production cluster. Even if there was just a way to say "this cluster", that would be significantly more ideal than hardcoded full URL database strings that include the username and password.
          Hide
          vatamane Nick Vatamaniuc added a comment - - edited

          Chris Foster Interesting points.

          Thinking more about this, it seems it is hard for a node in a cluster to know the hostname of the cluster in general. Say a cluster is behind a proxy for fault tolerance, after the document is added to a replicator db, can't see how it would know what the external cluster host would be. Say, for example, does database a mean https://user:pass@mycluster.com/a or http://user:pass@user.somecluster.net/a ?

          In case of _replicate endpoint we look at the socket that node is listening on and build an http url based on that. But as Robert mentioned above, that is a hack, which makes a few assumptions (http vs https for example).

          But if that hack seems like it does the right thing for you, I wonder if can use localhost to mean "this host". It would still be a full URL so it would work with the new validation rules, you'd lose fault tolerance, but wonder if that would solve your problem?

          Show
          vatamane Nick Vatamaniuc added a comment - - edited Chris Foster Interesting points. Thinking more about this, it seems it is hard for a node in a cluster to know the hostname of the cluster in general. Say a cluster is behind a proxy for fault tolerance, after the document is added to a replicator db, can't see how it would know what the external cluster host would be. Say, for example, does database a mean https://user:pass@mycluster.com/a or http://user:pass@user.somecluster.net/a ? In case of _replicate endpoint we look at the socket that node is listening on and build an http url based on that. But as Robert mentioned above, that is a hack, which makes a few assumptions (http vs https for example). But if that hack seems like it does the right thing for you, I wonder if can use localhost to mean "this host". It would still be a full URL so it would work with the new validation rules, you'd lose fault tolerance, but wonder if that would solve your problem?
          Hide
          wohali Joan Touzet added a comment -

          Ping Robert Newson. It'd be real swell to close this out before 2.0 is released. You seem to suggest above that your patch is inadvisable. Is another better at this point or are you comfortable with releasing the code "as-is"?

          Show
          wohali Joan Touzet added a comment - Ping Robert Newson . It'd be real swell to close this out before 2.0 is released. You seem to suggest above that your patch is inadvisable. Is another better at this point or are you comfortable with releasing the code "as-is"?
          Hide
          rnewson Robert Newson added a comment -

          I'll look again but making "local" names work correctly depending on whether they are initiating from the cluster port or private port or clustered _replicator db or node-local _replicator db is a tall order, especially at the last minute.

          That the _replicator database itself can be replicated is an especially ugly consequence of this regrettable api.

          Show
          rnewson Robert Newson added a comment - I'll look again but making "local" names work correctly depending on whether they are initiating from the cluster port or private port or clustered _replicator db or node-local _replicator db is a tall order, especially at the last minute. That the _replicator database itself can be replicated is an especially ugly consequence of this regrettable api.
          Hide
          wohali Joan Touzet added a comment -

          Thanks. If you don't think we can squeeze in a big change for this last minute, I want to remove the blocker status so 2.0 can proceed. If we need to document something as a known bug or issue for the release, we can do that instead - would you be willing to help out with a paragraph or two we could use?

          For others' reference, this is the last code-related 2.0 blocker that we have.

          Show
          wohali Joan Touzet added a comment - Thanks. If you don't think we can squeeze in a big change for this last minute, I want to remove the blocker status so 2.0 can proceed. If we need to document something as a known bug or issue for the release, we can do that instead - would you be willing to help out with a paragraph or two we could use? For others' reference, this is the last code-related 2.0 blocker that we have.
          Hide
          rnewson Robert Newson added a comment -

          I had a good stab at implementing what was needed; namely, supporting a third variant of database in couch_replicator_api_wrap that uses fabric. That work is on branch 2980-cluster-local-repl.

          I did not finish and I think it cannot be done in the timeframe of 2.0 to the quality we need. I'd like to make this not a blocker for 2.0. Users can use full remote urls for source/target to replicate clustered databases around. 2.0 will not have an answer for Chris's use case, but it's a valid one, and we should come back to this.

          Show
          rnewson Robert Newson added a comment - I had a good stab at implementing what was needed; namely, supporting a third variant of database in couch_replicator_api_wrap that uses fabric. That work is on branch 2980-cluster-local-repl. I did not finish and I think it cannot be done in the timeframe of 2.0 to the quality we need. I'd like to make this not a blocker for 2.0. Users can use full remote urls for source/target to replicate clustered databases around. 2.0 will not have an answer for Chris's use case, but it's a valid one, and we should come back to this.
          Hide
          rnewson Robert Newson added a comment -

          This is major work to fix properly as the clustered code base has never supported local source/target as requested here.

          Moving this down from Blocker, we'll address after 2.0.

          Show
          rnewson Robert Newson added a comment - This is major work to fix properly as the clustered code base has never supported local source/target as requested here. Moving this down from Blocker, we'll address after 2.0.
          Hide
          rnewson Robert Newson added a comment -

          this needs to be added to the release notes.

          Show
          rnewson Robert Newson added a comment - this needs to be added to the release notes.
          Hide
          vatamane Nick Vatamaniuc added a comment -

          Wonder if it is worth at least preventing creating local replications like the original pr did? https://github.com/apache/couchdb-couch-replicator/pull/41

          Otherwise behavior is surprising for someone with 1.x experience. And then later even if we add a local clustered support (say in 2.1), it will all of the sudden do something different.

          In the meantime is using `http://localhost:5984/db` an alternative for users to get the equivalent behavior? In other words would that cover Chris's case of make replicator db work as expected if it is replicated to another cluster?

          Show
          vatamane Nick Vatamaniuc added a comment - Wonder if it is worth at least preventing creating local replications like the original pr did? https://github.com/apache/couchdb-couch-replicator/pull/41 Otherwise behavior is surprising for someone with 1.x experience. And then later even if we add a local clustered support (say in 2.1), it will all of the sudden do something different. In the meantime is using ` http://localhost:5984/db ` an alternative for users to get the equivalent behavior? In other words would that cover Chris's case of make replicator db work as expected if it is replicated to another cluster?
          Hide
          chrisfosterelli Chris Foster added a comment -

          Hi everyone,

          Just chiming in on this again. Sorry for the delay, don't check JIRA often. Sounds like you guys have summarized our parallel conclusions.

          We also stumbled across using "http://localhost:5984/db" as a workaround. It's not perfect, because it essentially means we can not change the port without having to run some weird migration script. It does ensure we won't accidentally replicate to production all the time though.

          I think part of the confusion for me is that I was coming from 1.X. This worked really well for us in 1.X because we just did "databaseA" to "databaseB" and never had to worry about URLs. In 2.X, even though we still aren't using clustering yet and would prefer to not worry about it right now, it appears that our databases are still clustered (on one host) and our method of just specifying the database name was failing in a really confusing way.

          As mentioned, this workaround works for us for now so don't let us hold you back from 2.0, but it would be cool to eventually have a better approach for this. Although, I am not sure what that might look like

          Show
          chrisfosterelli Chris Foster added a comment - Hi everyone, Just chiming in on this again. Sorry for the delay, don't check JIRA often. Sounds like you guys have summarized our parallel conclusions. We also stumbled across using "http://localhost:5984/db" as a workaround. It's not perfect, because it essentially means we can not change the port without having to run some weird migration script. It does ensure we won't accidentally replicate to production all the time though. I think part of the confusion for me is that I was coming from 1.X. This worked really well for us in 1.X because we just did "databaseA" to "databaseB" and never had to worry about URLs. In 2.X, even though we still aren't using clustering yet and would prefer to not worry about it right now, it appears that our databases are still clustered (on one host) and our method of just specifying the database name was failing in a really confusing way. As mentioned, this workaround works for us for now so don't let us hold you back from 2.0, but it would be cool to eventually have a better approach for this. Although, I am not sure what that might look like

            People

            • Assignee:
              Unassigned
              Reporter:
              robertkowalski Robert Kowalski
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development