Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-3277

Replication manager when it finds _replicator db shards which are not part of a mem3 db

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently replication manager scans the file system for shards which have a _replicator suffix when it starts up and discovers all replicator dbs.

      However, in the case if there is a _replicator shard without a corresponding mem3 dbs db entry, replicator manager crashes.

      These "orphan" replicator shards could be created during db creation, as shards are created first then an entry in the dbs db is added. Or if there is a move or backup process which might leave some db shards around.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb-couch-replicator/pull/52

          Use mem3 to discover all _replicator shards in replicator manager

          Previously this was done via recursive db directory traversal, looking for
          shards names ending in `_replicator`. However, if there are orphanned shard
          files (not associated with a clustered db), replicator manager crashes. It
          restarts eventually, but as long as the orphanned shard file
          without an entry in dbs db is present on the file system, replicator manager
          will keep crashing and never reach some replication documents in shards which
          would be traversed after the problematic shard. The user-visible effect of this
          is some replication documents are never triggered.

          To fix, use mem3 to traverse and discover `_replicator` shards. This was used
          Cloudant's production code for many years it is battle-tested and it doesn't
          suffer from file system vs mem3 inconsistency.

          Local `_replicator` db is a special case. Since it is not clustered it will
          not appear in the clustered db list. However it is already handled as a special
          case in `init(_)` so that behavior is not affected by this change.

          COUCHDB-3277

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3277

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch-replicator/pull/52.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #52


          commit 8205420d4249cea98ec5568344c43ccf11bbc9b1
          Author: Nick Vatamaniuc <vatamane@apache.org>
          Date: 2017-01-24T05:35:32Z

          Use mem3 to discover all _replicator shards in replicator manager

          Previously this was done via recursive db directory traversal, looking for
          shards names ending in `_replicator`. However, if there are orphanned shard
          files (not associated with a clustered db), replicator manager crashes. It
          restarts eventually, but as long as the orphanned shard file
          without an entry in dbs db is present on the file system, replicator manager
          will keep crashing and never reach some replication documents in shards which
          would be traversed after the problematic shard. The user-visible effect of this
          is some replication documents are never triggered.

          To fix, use mem3 to traverse and discover `_replicator` shards. This was used
          Cloudant's production code for many years it is battle-tested and it doesn't
          suffer from file system vs mem3 inconsistency.

          Local `_replicator` db is a special case. Since it is not clustered it will
          not appear in the clustered db list. However it is already handled as a special
          case in `init(_)` so that behavior is not affected by this change.

          COUCHDB-3277


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb-couch-replicator/pull/52 Use mem3 to discover all _replicator shards in replicator manager Previously this was done via recursive db directory traversal, looking for shards names ending in `_replicator`. However, if there are orphanned shard files (not associated with a clustered db), replicator manager crashes. It restarts eventually, but as long as the orphanned shard file without an entry in dbs db is present on the file system, replicator manager will keep crashing and never reach some replication documents in shards which would be traversed after the problematic shard. The user-visible effect of this is some replication documents are never triggered. To fix, use mem3 to traverse and discover `_replicator` shards. This was used Cloudant's production code for many years it is battle-tested and it doesn't suffer from file system vs mem3 inconsistency. Local `_replicator` db is a special case. Since it is not clustered it will not appear in the clustered db list. However it is already handled as a special case in `init(_)` so that behavior is not affected by this change. COUCHDB-3277 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3277 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch-replicator/pull/52.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #52 commit 8205420d4249cea98ec5568344c43ccf11bbc9b1 Author: Nick Vatamaniuc <vatamane@apache.org> Date: 2017-01-24T05:35:32Z Use mem3 to discover all _replicator shards in replicator manager Previously this was done via recursive db directory traversal, looking for shards names ending in `_replicator`. However, if there are orphanned shard files (not associated with a clustered db), replicator manager crashes. It restarts eventually, but as long as the orphanned shard file without an entry in dbs db is present on the file system, replicator manager will keep crashing and never reach some replication documents in shards which would be traversed after the problematic shard. The user-visible effect of this is some replication documents are never triggered. To fix, use mem3 to traverse and discover `_replicator` shards. This was used Cloudant's production code for many years it is battle-tested and it doesn't suffer from file system vs mem3 inconsistency. Local `_replicator` db is a special case. Since it is not clustered it will not appear in the clustered db list. However it is already handled as a special case in `init(_)` so that behavior is not affected by this change. COUCHDB-3277
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b281d2bb320ed6e6d8226765315a40637ba91a46 in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=b281d2b ]

          Use mem3 to discover all _replicator shards in replicator manager

          Previously this was done via recursive db directory traversal, looking for
          shards names ending in `_replicator`. However, if there are orphanned shard
          files (not associated with a clustered db), replicator manager crashes. It
          restarts eventually, but as long as the orphanned shard file
          without an entry in dbs db is present on the file system, replicator manager
          will keep crashing and never reach some replication documents in shards which
          would be traversed after the problematic shard. The user-visible effect of this
          is some replication documents are never triggered.

          To fix, use mem3 to traverse and discover `_replicator` shards. This was used
          Cloudant's production code for many years it is battle-tested and it doesn't
          suffer from file system vs mem3 inconsistency.

          Local `_replicator` db is a special case. Since it is not clustered it will
          not appear in the clustered db list. However it is already handled as a special
          case in `init(_)` so that behavior is not affected by this change.

          COUCHDB-3277

          Show
          jira-bot ASF subversion and git services added a comment - Commit b281d2bb320ed6e6d8226765315a40637ba91a46 in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=b281d2b ] Use mem3 to discover all _replicator shards in replicator manager Previously this was done via recursive db directory traversal, looking for shards names ending in `_replicator`. However, if there are orphanned shard files (not associated with a clustered db), replicator manager crashes. It restarts eventually, but as long as the orphanned shard file without an entry in dbs db is present on the file system, replicator manager will keep crashing and never reach some replication documents in shards which would be traversed after the problematic shard. The user-visible effect of this is some replication documents are never triggered. To fix, use mem3 to traverse and discover `_replicator` shards. This was used Cloudant's production code for many years it is battle-tested and it doesn't suffer from file system vs mem3 inconsistency. Local `_replicator` db is a special case. Since it is not clustered it will not appear in the clustered db list. However it is already handled as a special case in `init(_)` so that behavior is not affected by this change. COUCHDB-3277
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/couchdb-couch-replicator/pull/52

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-couch-replicator/pull/52
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 401c1248e516f21da107b4e6c97ccb998a2408ee in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=401c124 ]

          Bump couch_replicator. Use mem3 for _replicator shard discovery

          COUCHDB-3277

          Show
          jira-bot ASF subversion and git services added a comment - Commit 401c1248e516f21da107b4e6c97ccb998a2408ee in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=401c124 ] Bump couch_replicator. Use mem3 for _replicator shard discovery COUCHDB-3277
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb-couch-replicator/pull/53

          Fix shards db name typo from previous commit

          Previous commit which switched to using mem3 for replicator shard
          discovery introduced a typo.

          `config:get("mem3", "shard_db", "dbs")`

          should be:

          `config:get("mem3", "shards_db", "_dbs")`

          COUCHDB-3277

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3277-typo

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch-replicator/pull/53.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #53


          commit be0060f3fffc308b7532e6b99355f0e0cdede88e
          Author: Nick Vatamaniuc <vatamane@apache.org>
          Date: 2017-01-25T04:17:26Z

          Fix shards db name typo from previous commit

          Previous commit which switched to using mem3 for replicator shard
          discovery introduced a typo.

          `config:get("mem3", "shard_db", "dbs")`

          should be:

          `config:get("mem3", "shards_db", "_dbs")`

          COUCHDB-3277


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb-couch-replicator/pull/53 Fix shards db name typo from previous commit Previous commit which switched to using mem3 for replicator shard discovery introduced a typo. `config:get("mem3", "shard_db", "dbs")` should be: `config:get("mem3", "shards_db", "_dbs")` COUCHDB-3277 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3277-typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch-replicator/pull/53.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #53 commit be0060f3fffc308b7532e6b99355f0e0cdede88e Author: Nick Vatamaniuc <vatamane@apache.org> Date: 2017-01-25T04:17:26Z Fix shards db name typo from previous commit Previous commit which switched to using mem3 for replicator shard discovery introduced a typo. `config:get("mem3", "shard_db", "dbs")` should be: `config:get("mem3", "shards_db", "_dbs")` COUCHDB-3277
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit be0060f3fffc308b7532e6b99355f0e0cdede88e in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=be0060f ]

          Fix shards db name typo from previous commit

          Previous commit which switched to using mem3 for replicator shard
          discovery introduced a typo.

          `config:get("mem3", "shard_db", "dbs")`

          should be:

          `config:get("mem3", "shards_db", "_dbs")`

          COUCHDB-3277

          Show
          jira-bot ASF subversion and git services added a comment - Commit be0060f3fffc308b7532e6b99355f0e0cdede88e in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=be0060f ] Fix shards db name typo from previous commit Previous commit which switched to using mem3 for replicator shard discovery introduced a typo. `config:get("mem3", "shard_db", "dbs")` should be: `config:get("mem3", "shards_db", "_dbs")` COUCHDB-3277
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/couchdb-couch-replicator/pull/53

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-couch-replicator/pull/53
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3ac5b9f4b9b648f8df0a7bf834f37608f455e559 in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=3ac5b9f ]

          Bump replicator dependency. Fixes typo in couch_replicator_manager

          mem3 `shards_db` instead of `shard_db`

          COUCHDB-3277

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3ac5b9f4b9b648f8df0a7bf834f37608f455e559 in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=3ac5b9f ] Bump replicator dependency. Fixes typo in couch_replicator_manager mem3 `shards_db` instead of `shard_db` COUCHDB-3277
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fb77cbc463caa573a51f971243a5cb18ee8b2e9a in couchdb-couch-replicator's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=fb77cbc ]

          Use mem3 in couch_multidb_changes to discover _replicator shards

          This is a forward-port of a corresponding commit in master:

          "Use mem3 to discover all _replicator shards in replicator manager"

          https://github.com/apache/couchdb-couch-replicator/commit/b281d2bb320ed6e6d8226765315a40637ba91a46

          This wasn't a direct merge as replicator shard discovery and traversal is slightly
          different.

          `couch_multidb_changes` is more generic and takes a db suffix and callback
          module. So `<<"_replicator">>` is not hard-coded in multidb changes module.

          `couch_replicator_manager` handles local `_replicator` db by directly
          creating it and launching a changes feed for it. In the scheduling replicator
          creation is separate from monitoring. The logic is handled in `scan_all_dbs`
          function where first thing it always checks if there is a local db present
          matching the suffix, if so a `

          {resume_scan, DbName}

          ` is sent to main process.
          Due to supervisor order by the time that code runs a local replicator db
          will be created already.

          COUCHDB-3277

          Show
          jira-bot ASF subversion and git services added a comment - Commit fb77cbc463caa573a51f971243a5cb18ee8b2e9a in couchdb-couch-replicator's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=fb77cbc ] Use mem3 in couch_multidb_changes to discover _replicator shards This is a forward-port of a corresponding commit in master: "Use mem3 to discover all _replicator shards in replicator manager" https://github.com/apache/couchdb-couch-replicator/commit/b281d2bb320ed6e6d8226765315a40637ba91a46 This wasn't a direct merge as replicator shard discovery and traversal is slightly different. `couch_multidb_changes` is more generic and takes a db suffix and callback module. So `<<"_replicator">>` is not hard-coded in multidb changes module. `couch_replicator_manager` handles local `_replicator` db by directly creating it and launching a changes feed for it. In the scheduling replicator creation is separate from monitoring. The logic is handled in `scan_all_dbs` function where first thing it always checks if there is a local db present matching the suffix, if so a ` {resume_scan, DbName} ` is sent to main process. Due to supervisor order by the time that code runs a local replicator db will be created already. COUCHDB-3277

            People

            • Assignee:
              Unassigned
              Reporter:
              vatamane Nick Vatamaniuc
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development