Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Improve CouchDB replicator

      • Allow running a large number of replication jobs
      • Improve API with a focus on ease of use and performance. Avoid updating replication document with transient state updates. Instead create a proper API for querying replication states. At the same time provide a compatibility mode to let users keep existing behavior (of getting updates in documents).
      • Improve network resource usage and performance. Multiple connection to the same cluster could share socket connections
      • Handle rate limiting on target and source HTTP endpoints. Let replication request auto-discover rate limit capacity based on a proven algorithm such as Additive Increase / Multiplicative Decrease feedback control loop.
      • Improve performance by avoiding repeatedly retrying failing replication jobs. Instead use exponential backoff.
      • Improve recovery from long (but temporary) network failure. Currently if replications jobs fail to start 10 times in a row they will not be retried anymore. This is not always desirable. In case of a long enough DNS (or other network) failure replication jobs will effectively stop until they are manually restarted.
      • Better handling of filtered replications: Failing to fetch filters could block couch replicator manager, lead to message queue backups and memory exhaustion. Also, when replication filter code changes update replication accordingly (replication job ID should change in that case)
      • Provide better metrics to introspect replicator behavior.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb/pull/454

          Point to scheduling replicator dependencies.

          This updates chttpd, couch, and couch_replicator to point to current commits on
          63012-scheduler branches which contain code for the new scheduling replicator.

          COUCHDB-3324

          This PR can be used to test current heads of all the scheduling replicator branches on all the involved dependencies.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/couchdb 63012-scheduler

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb/pull/454.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #454


          commit e9840c74adc9c7622cda1d964294f910530066f2
          Author: Nick Vatamaniuc <vatamane@apache.org>
          Date: 2017-03-14T19:55:06Z

          Point to scheduling replicator dependencies.

          This updates chttpd, couch, and couch_replicator to point to current commits on
          63012-scheduler branches which contain code for the new scheduling replicator.

          COUCHDB-3324


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb/pull/454 Point to scheduling replicator dependencies. This updates chttpd, couch, and couch_replicator to point to current commits on 63012-scheduler branches which contain code for the new scheduling replicator. COUCHDB-3324 This PR can be used to test current heads of all the scheduling replicator branches on all the involved dependencies. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb 63012-scheduler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb/pull/454.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #454 commit e9840c74adc9c7622cda1d964294f910530066f2 Author: Nick Vatamaniuc <vatamane@apache.org> Date: 2017-03-14T19:55:06Z Point to scheduling replicator dependencies. This updates chttpd, couch, and couch_replicator to point to current commits on 63012-scheduler branches which contain code for the new scheduling replicator. COUCHDB-3324
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb-couch-replicator/pull/64

          63012 scheduler

          Pull request to merge scheduling replicator work to ASF master.

          This repository has most of the changes. The feature overall consists of updates to 3 repositories: this one (replicator), couch and chttpd. To test all there is a separate top level PR to couchdb:

          https://github.com/apache/couchdb/pull/454

          COUCHDB-3324

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/couchdb-couch-replicator 63012-scheduler

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch-replicator/pull/64.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #64



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb-couch-replicator/pull/64 63012 scheduler Pull request to merge scheduling replicator work to ASF master. This repository has most of the changes. The feature overall consists of updates to 3 repositories: this one (replicator), couch and chttpd. To test all there is a separate top level PR to couchdb: https://github.com/apache/couchdb/pull/454 COUCHDB-3324 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb-couch-replicator 63012-scheduler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch-replicator/pull/64.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #64
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb-couch/pull/238

          Add _replication_start_time to the doc field validation section.

          This is part of a set of PRs to merge the new scheduling replicator

          Main PR is this: https://github.com/apache/couchdb-couch-replicator/pull/64

          Top level PR to gather and help test all the dependencies: https://github.com/apache/couchdb/pull/454

          COUCHDB-3324

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/couchdb-couch 63012-scheduler

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch/pull/238.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #238


          commit f92b68f66bfd286b8702527a2322cfc4c0c2a2bc
          Author: Nick Vatamaniuc <vatamane@apache.org>
          Date: 2016-11-04T21:27:13Z

          Add _replication_start_time to the doc field validation section.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb-couch/pull/238 Add _replication_start_time to the doc field validation section. This is part of a set of PRs to merge the new scheduling replicator Main PR is this: https://github.com/apache/couchdb-couch-replicator/pull/64 Top level PR to gather and help test all the dependencies: https://github.com/apache/couchdb/pull/454 COUCHDB-3324 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb-couch 63012-scheduler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch/pull/238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #238 commit f92b68f66bfd286b8702527a2322cfc4c0c2a2bc Author: Nick Vatamaniuc <vatamane@apache.org> Date: 2016-11-04T21:27:13Z Add _replication_start_time to the doc field validation section.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb-chttpd/pull/158

          63012 scheduler

          This is part of a set of PRs to merge the new scheduling replicator

          Main PR is this: apache/couchdb-couch-replicator#64

          Top level PR to gather and help test all the dependencies: apache/couchdb#454

          COUCHDB-3324

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/couchdb-chttpd 63012-scheduler

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-chttpd/pull/158.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #158


          commit b8c218b2e586adeb84d8d496105ede4cc6357331
          Author: Benjamin Bastian <benjamin.bastian@gmail.com>
          Date: 2017-03-10T21:06:04Z

          Add `_scheduler/

          {jobs,docs}` API endpoints

          The `_scheduler/docs` endpoint provides a view of all
          replicator docs which have been seen by the scheduler. This endpoint
          includes useful information such as the state of the replication and the
          coordinator node.

          The `_scheduler/jobs` endpoint provides a view of all replications
          managed by the scheduler. This endpoint includes more information on the
          replication than the `_scheduler/docs` endpoint, including the history
          of state transitions of the replication.

          commit d493c9573f88520c5d24ef0f7ad9fc2e19a398b2
          Author: Benjamin Bastian <benjamin.bastian@gmail.com>
          Date: 2017-03-14T18:15:52Z

          Merge pull request #3 from cloudant/63012-scheduler-tasks-api

          Add `_scheduler/{jobs,docs}

          ` API endpoints


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb-chttpd/pull/158 63012 scheduler This is part of a set of PRs to merge the new scheduling replicator Main PR is this: apache/couchdb-couch-replicator#64 Top level PR to gather and help test all the dependencies: apache/couchdb#454 COUCHDB-3324 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb-chttpd 63012-scheduler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-chttpd/pull/158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #158 commit b8c218b2e586adeb84d8d496105ede4cc6357331 Author: Benjamin Bastian <benjamin.bastian@gmail.com> Date: 2017-03-10T21:06:04Z Add `_scheduler/ {jobs,docs}` API endpoints The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. commit d493c9573f88520c5d24ef0f7ad9fc2e19a398b2 Author: Benjamin Bastian <benjamin.bastian@gmail.com> Date: 2017-03-14T18:15:52Z Merge pull request #3 from cloudant/63012-scheduler-tasks-api Add `_scheduler/{jobs,docs} ` API endpoints
          Show
          vatamane Nick Vatamaniuc added a comment - Fauxton PR https://github.com/apache/couchdb-fauxton/pull/864
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ff927a3a41f5f94bea94f366b6d657a8dcaf9430 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=ff927a3 ]

          Point to scheduling replicator dependencies.

          This updates chttpd, couch, and couch_replicator to point to current commits on
          63012-scheduler branches which contain code for the new scheduling replicator.

          COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit ff927a3a41f5f94bea94f366b6d657a8dcaf9430 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=ff927a3 ] Point to scheduling replicator dependencies. This updates chttpd, couch, and couch_replicator to point to current commits on 63012-scheduler branches which contain code for the new scheduling replicator. COUCHDB-3324
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user nickva closed the pull request at:

          https://github.com/apache/couchdb/pull/454

          Show
          githubbot ASF GitHub Bot added a comment - Github user nickva closed the pull request at: https://github.com/apache/couchdb/pull/454
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit aecdad9de619cfffa03ab591c0e3c939c7b2b6e5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=aecdad9 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit aecdad9de619cfffa03ab591c0e3c939c7b2b6e5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=aecdad9 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4f9168422919ea6c3fafdb41212efd2e9da7e280 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=4f91684 ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/docs` endpoint provides a view of all
          replicator docs which have been seen by the scheduler. This endpoint
          includes useful information such as the state of the replication and the
          coordinator node.

          The `_scheduler/jobs` endpoint provides a view of all replications
          managed by the scheduler. This endpoint includes more information on the
          replication than the `_scheduler/docs` endpoint, including the history
          of state transitions of the replication.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4f9168422919ea6c3fafdb41212efd2e9da7e280 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=4f91684 ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 382d4880514bc63e1c07cfa810a76485917c0bec in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=382d488 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 382d4880514bc63e1c07cfa810a76485917c0bec in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=382d488 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8ab8d1c0b2ec8a1dfb84f804796001610448920e in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=8ab8d1c ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8ab8d1c0b2ec8a1dfb84f804796001610448920e in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=8ab8d1c ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user nickva closed the pull request at:

          https://github.com/apache/couchdb-couch/pull/238

          Show
          githubbot ASF GitHub Bot added a comment - Github user nickva closed the pull request at: https://github.com/apache/couchdb-couch/pull/238
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user nickva closed the pull request at:

          https://github.com/apache/couchdb-chttpd/pull/158

          Show
          githubbot ASF GitHub Bot added a comment - Github user nickva closed the pull request at: https://github.com/apache/couchdb-chttpd/pull/158
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7c2c27e19e30e08afb4659cb3c427b7cc8f8404b in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=7c2c27e ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/docs` endpoint provides a view of all
          replicator docs which have been seen by the scheduler. This endpoint
          includes useful information such as the state of the replication and the
          coordinator node.

          The `_scheduler/jobs` endpoint provides a view of all replications
          managed by the scheduler. This endpoint includes more information on the
          replication than the `_scheduler/docs` endpoint, including the history
          of state transitions of the replication.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7c2c27e19e30e08afb4659cb3c427b7cc8f8404b in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=7c2c27e ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f63267f4ff734a6925902756979ecefdee60e992 in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=f63267f ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit f63267f4ff734a6925902756979ecefdee60e992 in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=f63267f ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d85be6bf92c7a6aad576b5e4f8485cbeeaa565c0 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=d85be6b ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d85be6bf92c7a6aad576b5e4f8485cbeeaa565c0 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=d85be6b ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6908717c938ab9ac63563a746e0871300097e257 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=6908717 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6908717c938ab9ac63563a746e0871300097e257 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=6908717 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ddcd41954258b9a5eaa95c5ecbc6300bf9f85282 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=ddcd419 ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit ddcd41954258b9a5eaa95c5ecbc6300bf9f85282 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=ddcd419 ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3d286c1539c98bd68d25ca011a7d4b8edaeadda5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=3d286c1 ]

          Implement replication document processor

          Document processor listens for `_replicator` db document updates, parses those
          changes then tries to add replication jobs to the scheduler.

          Listening for changes happens in `couch_multidb_changes module`. That module is
          generic and is set up to listen to shards with `_replicator` suffix by
          `couch_replicator_db_changes`. Updates are then passed to the document
          processor's `process_change/2` function.

          Document replication ID calculation, which can involve fetching filter code
          from the source DB, and addition to the scheduler, is done in a separate
          worker process: `couch_replicator_doc_processor_worker`.

          Before couch replicator manager did most of this work. There are a few
          improvement over previous implementation:

          • Invalid (malformed) replication documents are immediately failed and will
            not be continuously retried.
          • Replication manager message queue backups is unfortunately a common issue
            in production. This is because processing document updates is a serial
            (blocking) operation. Most of that blocking code was moved to separate worker
            processes.
          • Failing filter fetches have an exponential backoff.
          • Replication documents don't have to be deleted first then re-added in order
            update the replication. Document processor on update will compare new and
            previous replication related document fields and update the replication job
            if those changed. Users can freely update unlrelated (custom) fields in their
            replication docs.
          • In case of filtered replications using custom functions, document processor
            will periodically check if filter code on the source has changed. Filter code
            contents is factored into replication ID calculation. If filter code changes
            replication ID will change as well.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3d286c1539c98bd68d25ca011a7d4b8edaeadda5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=3d286c1 ] Implement replication document processor Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: Invalid (malformed) replication documents are immediately failed and will not be continuously retried. Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. Failing filter fetches have an exponential backoff. Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9c22dc3276e5c46c9cc10347c57668b9f596e160 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=9c22dc3 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9c22dc3276e5c46c9cc10347c57668b9f596e160 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=9c22dc3 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c2d381c48eb4f3cdf38012bcda78909b262d9c17 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=c2d381c ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/docs` endpoint provides a view of all
          replicator docs which have been seen by the scheduler. This endpoint
          includes useful information such as the state of the replication and the
          coordinator node.

          The `_scheduler/jobs` endpoint provides a view of all replications
          managed by the scheduler. This endpoint includes more information on the
          replication than the `_scheduler/docs` endpoint, including the history
          of state transitions of the replication.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit c2d381c48eb4f3cdf38012bcda78909b262d9c17 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=c2d381c ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f6658fcf7ba473c7df5fd7f6b2bd267e9a540543 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=f6658fc ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/docs` endpoint provides a view of all
          replicator docs which have been seen by the scheduler. This endpoint
          includes useful information such as the state of the replication and the
          coordinator node.

          The `_scheduler/jobs` endpoint provides a view of all replications
          managed by the scheduler. This endpoint includes more information on the
          replication than the `_scheduler/docs` endpoint, including the history
          of state transitions of the replication.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit f6658fcf7ba473c7df5fd7f6b2bd267e9a540543 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=f6658fc ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 60795957bca05004ad007618ec77448ff14f55ea in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=6079595 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 60795957bca05004ad007618ec77448ff14f55ea in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=6079595 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 86b251f9156ea1da9d6bfa6ac726382b7dc53f81 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=86b251f ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 86b251f9156ea1da9d6bfa6ac726382b7dc53f81 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=86b251f ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit edeaad1b369e91127097f6f903fab5d24fd27597 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=edeaad1 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit edeaad1b369e91127097f6f903fab5d24fd27597 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=edeaad1 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dae8101188e50518e68235c742b361b5cd420c01 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=dae8101 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit dae8101188e50518e68235c742b361b5cd420c01 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=dae8101 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ec36f9f046f13f67f999c1497e75ddb367b45d41 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=ec36f9f ]

          Implement fabric-based _scheduler/docs endpoint

          Previously _scheduler/docs implementation was not optimal. All documents
          were fetched via rpc:multicall from each of the nodes.

          Switch implementation to mimic _all_docs behavior. The algorithm is roughly
          as follows:

          • chttpd endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit ec36f9f046f13f67f999c1497e75ddb367b45d41 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=ec36f9f ] Implement fabric-based _scheduler/docs endpoint Previously _scheduler/docs implementation was not optimal. All documents were fetched via rpc:multicall from each of the nodes. Switch implementation to mimic _all_docs behavior. The algorithm is roughly as follows: chttpd endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user nickva closed the pull request at:

          https://github.com/apache/couchdb-couch-replicator/pull/64

          Show
          githubbot ASF GitHub Bot added a comment - Github user nickva closed the pull request at: https://github.com/apache/couchdb-couch-replicator/pull/64
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4fb00071086fccf19bc93cfd0faa8936727929b5 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=4fb0007 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4fb00071086fccf19bc93cfd0faa8936727929b5 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-documentation.git;h=4fb0007 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b646899475a043c874830d2567053a63486dd98a in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=b646899 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit b646899475a043c874830d2567053a63486dd98a in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=b646899 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 07dad9cc2d47a90ae80708e6446e4bb9cb72cba4 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07dad9c ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 07dad9cc2d47a90ae80708e6446e4bb9cb72cba4 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07dad9c ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 35ca44fd421db52af2d7dafac260a39bf49bd8a1 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=35ca44f ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 35ca44fd421db52af2d7dafac260a39bf49bd8a1 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=35ca44f ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5e3b2746a500923f76187007ceda685d90eb698d in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5e3b274 ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5e3b2746a500923f76187007ceda685d90eb698d in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5e3b274 ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b22bdab612e85d31efd433cb40ec6c7dd6beb0be in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=b22bdab ]

          Implement replication document processor

          Document processor listens for `_replicator` db document updates, parses those
          changes then tries to add replication jobs to the scheduler.

          Listening for changes happens in `couch_multidb_changes module`. That module is
          generic and is set up to listen to shards with `_replicator` suffix by
          `couch_replicator_db_changes`. Updates are then passed to the document
          processor's `process_change/2` function.

          Document replication ID calculation, which can involve fetching filter code
          from the source DB, and addition to the scheduler, is done in a separate
          worker process: `couch_replicator_doc_processor_worker`.

          Before couch replicator manager did most of this work. There are a few
          improvement over previous implementation:

          • Invalid (malformed) replication documents are immediately failed and will
            not be continuously retried.
          • Replication manager message queue backups is unfortunately a common issue
            in production. This is because processing document updates is a serial
            (blocking) operation. Most of that blocking code was moved to separate worker
            processes.
          • Failing filter fetches have an exponential backoff.
          • Replication documents don't have to be deleted first then re-added in order
            update the replication. Document processor on update will compare new and
            previous replication related document fields and update the replication job
            if those changed. Users can freely update unlrelated (custom) fields in their
            replication docs.
          • In case of filtered replications using custom functions, document processor
            will periodically check if filter code on the source has changed. Filter code
            contents is factored into replication ID calculation. If filter code changes
            replication ID will change as well.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit b22bdab612e85d31efd433cb40ec6c7dd6beb0be in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=b22bdab ] Implement replication document processor Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: Invalid (malformed) replication documents are immediately failed and will not be continuously retried. Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. Failing filter fetches have an exponential backoff. Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8d1aa786072b849bc7e5296b53272d59d5440f5a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=8d1aa78 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8d1aa786072b849bc7e5296b53272d59d5440f5a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=8d1aa78 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d775c012517530b24dba4bf43c458be07613a36a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d775c01 ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/jobs` endpoint provides a view of all replications managed by
          the scheduler. This endpoint includes more information on the replication than
          the `_scheduler/docs` endpoint, including the history of state transitions of
          the replication. This part was implemented by Benjamin Bastian.

          The `_scheduler/docs` endpoint provides a view of all replicator docs which
          have been seen by the scheduler. This endpoint includes useful information such
          as the state of the replication and the coordinator node. The implemention of
          `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination,
          HTTP request processing and fabric / rexi setup. The algorithm is roughly
          as follows:

          • http endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d775c012517530b24dba4bf43c458be07613a36a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d775c01 ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: http endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0d7c6e409c35ba7d26106923761ccb056604a55f in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=0d7c6e4 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0d7c6e409c35ba7d26106923761ccb056604a55f in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=0d7c6e4 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d0b7b7a1b6934c39636becfca2ab0b733fcaaf58 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d0b7b7a ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d0b7b7a1b6934c39636becfca2ab0b733fcaaf58 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d0b7b7a ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 173b8f4ad0629a22c7a44ecc63e2245021b43833 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=173b8f4 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 173b8f4ad0629a22c7a44ecc63e2245021b43833 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=173b8f4 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fb1367206b7c436d3560a88dfa17e8b837be0e35 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fb13672 ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit fb1367206b7c436d3560a88dfa17e8b837be0e35 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fb13672 ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f3efc5a7410f35c4a5e14661cca49e93873a9878 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f3efc5a ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit f3efc5a7410f35c4a5e14661cca49e93873a9878 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f3efc5a ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit aebec51b3f1e8856f16a6552d54aea0a181a1a4d in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=aebec51 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit aebec51b3f1e8856f16a6552d54aea0a181a1a4d in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=aebec51 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a9170e48421e4b2348b16e54ff7020a52c50434b in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=a9170e4 ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit a9170e48421e4b2348b16e54ff7020a52c50434b in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=a9170e4 ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c13d9551e022bc9ad2f75bfc4cb9cbb0f74d5b50 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=c13d955 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit c13d9551e022bc9ad2f75bfc4cb9cbb0f74d5b50 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=c13d955 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 75758d436454e7695d3c9a0cf56505d2c70082ac in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=75758d4 ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 75758d436454e7695d3c9a0cf56505d2c70082ac in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=75758d4 ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4863886b93dc9d8697f8461160abbd6dcc6b04b9 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4863886 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4863886b93dc9d8697f8461160abbd6dcc6b04b9 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4863886 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9cacdba2966fc004dc5dbdb19d74609b258b9223 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9cacdba ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9cacdba2966fc004dc5dbdb19d74609b258b9223 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9cacdba ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1bf2fd5223817402ed1fb0bc30833f460e2d9a4a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=1bf2fd5 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1bf2fd5223817402ed1fb0bc30833f460e2d9a4a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=1bf2fd5 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bc9647b1427d3cc65fcad2a865cbdd936e8413ce in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=bc9647b ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/jobs` endpoint provides a view of all replications managed by
          the scheduler. This endpoint includes more information on the replication than
          the `_scheduler/docs` endpoint, including the history of state transitions of
          the replication. This part was implemented by Benjamin Bastian.

          The `_scheduler/docs` endpoint provides a view of all replicator docs which
          have been seen by the scheduler. This endpoint includes useful information such
          as the state of the replication and the coordinator node. The implemention of
          `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination,
          HTTP request processing and fabric / rexi setup. The algorithm is roughly
          as follows:

          • http endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit bc9647b1427d3cc65fcad2a865cbdd936e8413ce in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=bc9647b ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: http endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 337b705e3d0a6d4384d01b348d5016d9f453e75f in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=337b705 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 337b705e3d0a6d4384d01b348d5016d9f453e75f in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=337b705 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5c5a951aad2087da1f5bcd7963a755ce901d5323 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5c5a951 ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5c5a951aad2087da1f5bcd7963a755ce901d5323 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5c5a951 ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4eab43d7719fb99062bb57b21b74d4bef24c1f93 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4eab43d ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4eab43d7719fb99062bb57b21b74d4bef24c1f93 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4eab43d ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 850aadc554c0e35c0e1902c9660ec109b7a92ab5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=850aadc ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 850aadc554c0e35c0e1902c9660ec109b7a92ab5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=850aadc ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 466e63854b53042431b0310006880d5c2275e434 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=466e638 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 466e63854b53042431b0310006880d5c2275e434 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=466e638 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a848757060849791353738d3ee62e1134f0982d0 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=a848757 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit a848757060849791353738d3ee62e1134f0982d0 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=a848757 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a9b4dfa5743e0385f3b653d10eb4e967f196ee54 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=a9b4dfa ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit a9b4dfa5743e0385f3b653d10eb4e967f196ee54 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=a9b4dfa ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit df0bffeb323c00ee6bd0c940f4fe56ed927158ca in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=df0bffe ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit df0bffeb323c00ee6bd0c940f4fe56ed927158ca in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=df0bffe ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 37d5dbaa5a836346d6ee58fa7a7bdedcdbef85b8 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=37d5dba ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 37d5dbaa5a836346d6ee58fa7a7bdedcdbef85b8 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=37d5dba ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 65dfb826008842efc57290ac115b94c731b57a38 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=65dfb82 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 65dfb826008842efc57290ac115b94c731b57a38 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=65dfb82 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d2af984ead3cbc4a5d2e60267171a969bb93e81b in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d2af984 ]

          AIMD based rate limiter implementation

          AIMD: additive increase / multiplicative decrease feedback control algorithm.

          https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

          This is an algorithm which converges on the available channel capacity.
          Each participant doesn't a priori know the capacity and participants don't
          communicate or know about each other (so they don't coordinate to divide
          the capacity among themselves).

          A variation of this is used in TCP congestion control algorithm. This is proven
          to converge, while for example, additive increase / additive decrease or
          multiplicative increase / multiplicative decrease won't.

          A few tweaks were applied to the base control logic:

          • Estimated value is an interval (period) instead of a rate. This is for
            convenience, as users will probably want to know how much to sleep. But,
            rate is just 1000 / interval, so it is easy to transform.
          • There is a hard max limit for estimated period. Mainly as a practical concern
            as connections sleeping too long will timeout and / or jobs will waste time
            sleeping and consume scheduler slots, while others could be running.
          • There is a time decay component used to handle large pauses between updates.
            In case of large update interval, assume (optimistically) some successful
            requests have been made. Intuitively, the more time passes, the less accurate
            the estimated period probably is.
          • The rate of updates applied to the algorithm is limited. This effectively
            acts as a low pass filter and make the algorithm handle better spikes and
            short bursts of failures. This is not a novel idea, some alternative TCP
            control algorithms like Westwood+ do something similar.
          • There is a large downward pressure applied to the increasing interval as it
            approaches the max limit. This is done by tweaking the additive factor via
            a step function. In practice this has effect of trying to make it a bit
            harder for jobs to cross the maximum backoff threshold, as they would be
            killed and potentially lose intermediate work.

          Main API functions are:

          success(Key) -> IntervalInMilliseconds

          failure(Key) -> IntervalInMilliseconds

          interval(Key) -> IntervalInMilliseconds

          Key is any (hashable by phash2) term. Typically would be something like

          {Method, Url}

          . The result from the function is the current period value. Caller
          would then presumably choose to sleep for that amount of time before or after
          making requests. The current interval can be read with interval(Key) function.

          Implementation is sharded ETS tables based on the key and there is a periodic
          timer which cleans unused items.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d2af984ead3cbc4a5d2e60267171a969bb93e81b in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d2af984 ] AIMD based rate limiter implementation AIMD: additive increase / multiplicative decrease feedback control algorithm. https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease This is an algorithm which converges on the available channel capacity. Each participant doesn't a priori know the capacity and participants don't communicate or know about each other (so they don't coordinate to divide the capacity among themselves). A variation of this is used in TCP congestion control algorithm. This is proven to converge, while for example, additive increase / additive decrease or multiplicative increase / multiplicative decrease won't. A few tweaks were applied to the base control logic: Estimated value is an interval (period) instead of a rate. This is for convenience, as users will probably want to know how much to sleep. But, rate is just 1000 / interval, so it is easy to transform. There is a hard max limit for estimated period. Mainly as a practical concern as connections sleeping too long will timeout and / or jobs will waste time sleeping and consume scheduler slots, while others could be running. There is a time decay component used to handle large pauses between updates. In case of large update interval, assume (optimistically) some successful requests have been made. Intuitively, the more time passes, the less accurate the estimated period probably is. The rate of updates applied to the algorithm is limited. This effectively acts as a low pass filter and make the algorithm handle better spikes and short bursts of failures. This is not a novel idea, some alternative TCP control algorithms like Westwood+ do something similar. There is a large downward pressure applied to the increasing interval as it approaches the max limit. This is done by tweaking the additive factor via a step function. In practice this has effect of trying to make it a bit harder for jobs to cross the maximum backoff threshold, as they would be killed and potentially lose intermediate work. Main API functions are: success(Key) -> IntervalInMilliseconds failure(Key) -> IntervalInMilliseconds interval(Key) -> IntervalInMilliseconds Key is any (hashable by phash2) term. Typically would be something like {Method, Url} . The result from the function is the current period value. Caller would then presumably choose to sleep for that amount of time before or after making requests. The current interval can be read with interval(Key) function. Implementation is sharded ETS tables based on the key and there is a periodic timer which cleans unused items. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4935c9247e9e04998ff372889fb7c01c19690a29 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4935c92 ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4935c9247e9e04998ff372889fb7c01c19690a29 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4935c92 ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dff5ae797b8cdac721b76904bfad95cf187a3d55 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=dff5ae7 ]

          Implement replication document processor

          Document processor listens for `_replicator` db document updates, parses those
          changes then tries to add replication jobs to the scheduler.

          Listening for changes happens in `couch_multidb_changes module`. That module is
          generic and is set up to listen to shards with `_replicator` suffix by
          `couch_replicator_db_changes`. Updates are then passed to the document
          processor's `process_change/2` function.

          Document replication ID calculation, which can involve fetching filter code
          from the source DB, and addition to the scheduler, is done in a separate
          worker process: `couch_replicator_doc_processor_worker`.

          Before couch replicator manager did most of this work. There are a few
          improvement over previous implementation:

          • Invalid (malformed) replication documents are immediately failed and will
            not be continuously retried.
          • Replication manager message queue backups is unfortunately a common issue
            in production. This is because processing document updates is a serial
            (blocking) operation. Most of that blocking code was moved to separate worker
            processes.
          • Failing filter fetches have an exponential backoff.
          • Replication documents don't have to be deleted first then re-added in order
            update the replication. Document processor on update will compare new and
            previous replication related document fields and update the replication job
            if those changed. Users can freely update unlrelated (custom) fields in their
            replication docs.
          • In case of filtered replications using custom functions, document processor
            will periodically check if filter code on the source has changed. Filter code
            contents is factored into replication ID calculation. If filter code changes
            replication ID will change as well.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit dff5ae797b8cdac721b76904bfad95cf187a3d55 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=dff5ae7 ] Implement replication document processor Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: Invalid (malformed) replication documents are immediately failed and will not be continuously retried. Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. Failing filter fetches have an exponential backoff. Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 628cf6fe0565180ce0d335072f04839b3997be52 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=628cf6f ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 628cf6fe0565180ce0d335072f04839b3997be52 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=628cf6f ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 67c45533cfb10f80b6424b0a750ca304177ab69c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=67c4553 ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/jobs` endpoint provides a view of all replications managed by
          the scheduler. This endpoint includes more information on the replication than
          the `_scheduler/docs` endpoint, including the history of state transitions of
          the replication. This part was implemented by Benjamin Bastian.

          The `_scheduler/docs` endpoint provides a view of all replicator docs which
          have been seen by the scheduler. This endpoint includes useful information such
          as the state of the replication and the coordinator node. The implemention of
          `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination,
          HTTP request processing and fabric / rexi setup. The algorithm is roughly
          as follows:

          • http endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 67c45533cfb10f80b6424b0a750ca304177ab69c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=67c4553 ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: http endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 79f34902a32b636284cf62a58501ab3ae689498a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=79f3490 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 79f34902a32b636284cf62a58501ab3ae689498a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=79f3490 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit eee868d8fc39429ddfbed66fbb7deaa38783484c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=eee868d ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit eee868d8fc39429ddfbed66fbb7deaa38783484c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=eee868d ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d6888a0eccc7bceff2521f804dbb3c0e91c45f2a in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d6888a0 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d6888a0eccc7bceff2521f804dbb3c0e91c45f2a in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d6888a0 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4c48f69b1abddf8081ef1e04a05cb1ef1add51fb in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c48f69 ]

          Cluster ownership module implementation

          This module maintains cluster membership information for replication and
          provides functions to check ownership of replication jobs.

          A cluster membership change is registered only after a configurable
          `cluster_quiet_period` interval has passed since the last node addition or
          removal. This is useful in cases of rolling node reboots in a cluster in order
          to avoid rescanning for membership changes after every node up and down event,
          and instead doing only on rescan at the very end.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4c48f69b1abddf8081ef1e04a05cb1ef1add51fb in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c48f69 ] Cluster ownership module implementation This module maintains cluster membership information for replication and provides functions to check ownership of replication jobs. A cluster membership change is registered only after a configurable `cluster_quiet_period` interval has passed since the last node addition or removal. This is useful in cases of rolling node reboots in a cluster in order to avoid rescanning for membership changes after every node up and down event, and instead doing only on rescan at the very end. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fe49d4465076cec073304c043c81d2d4fc0f2a62 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fe49d44 ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit fe49d4465076cec073304c043c81d2d4fc0f2a62 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fe49d44 ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 08abed8bdc0c057ea85af32cbe4b526015496d6c in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=08abed8 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 08abed8bdc0c057ea85af32cbe4b526015496d6c in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=08abed8 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0e0b42c0f18944bc2073f93deab6dc857db6da30 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=0e0b42c ]

          AIMD based rate limiter implementation

          AIMD: additive increase / multiplicative decrease feedback control algorithm.

          https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

          This is an algorithm which converges on the available channel capacity.
          Each participant doesn't a priori know the capacity and participants don't
          communicate or know about each other (so they don't coordinate to divide
          the capacity among themselves).

          A variation of this is used in TCP congestion control algorithm. This is proven
          to converge, while for example, additive increase / additive decrease or
          multiplicative increase / multiplicative decrease won't.

          A few tweaks were applied to the base control logic:

          • Estimated value is an interval (period) instead of a rate. This is for
            convenience, as users will probably want to know how much to sleep. But,
            rate is just 1000 / interval, so it is easy to transform.
          • There is a hard max limit for estimated period. Mainly as a practical concern
            as connections sleeping too long will timeout and / or jobs will waste time
            sleeping and consume scheduler slots, while others could be running.
          • There is a time decay component used to handle large pauses between updates.
            In case of large update interval, assume (optimistically) some successful
            requests have been made. Intuitively, the more time passes, the less accurate
            the estimated period probably is.
          • The rate of updates applied to the algorithm is limited. This effectively
            acts as a low pass filter and make the algorithm handle better spikes and
            short bursts of failures. This is not a novel idea, some alternative TCP
            control algorithms like Westwood+ do something similar.
          • There is a large downward pressure applied to the increasing interval as it
            approaches the max limit. This is done by tweaking the additive factor via
            a step function. In practice this has effect of trying to make it a bit
            harder for jobs to cross the maximum backoff threshold, as they would be
            killed and potentially lose intermediate work.

          Main API functions are:

          success(Key) -> IntervalInMilliseconds

          failure(Key) -> IntervalInMilliseconds

          interval(Key) -> IntervalInMilliseconds

          Key is any (hashable by phash2) term. Typically would be something like

          {Method, Url}

          . The result from the function is the current period value. Caller
          would then presumably choose to sleep for that amount of time before or after
          making requests. The current interval can be read with interval(Key) function.

          Implementation is sharded ETS tables based on the key and there is a periodic
          timer which cleans unused items.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0e0b42c0f18944bc2073f93deab6dc857db6da30 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=0e0b42c ] AIMD based rate limiter implementation AIMD: additive increase / multiplicative decrease feedback control algorithm. https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease This is an algorithm which converges on the available channel capacity. Each participant doesn't a priori know the capacity and participants don't communicate or know about each other (so they don't coordinate to divide the capacity among themselves). A variation of this is used in TCP congestion control algorithm. This is proven to converge, while for example, additive increase / additive decrease or multiplicative increase / multiplicative decrease won't. A few tweaks were applied to the base control logic: Estimated value is an interval (period) instead of a rate. This is for convenience, as users will probably want to know how much to sleep. But, rate is just 1000 / interval, so it is easy to transform. There is a hard max limit for estimated period. Mainly as a practical concern as connections sleeping too long will timeout and / or jobs will waste time sleeping and consume scheduler slots, while others could be running. There is a time decay component used to handle large pauses between updates. In case of large update interval, assume (optimistically) some successful requests have been made. Intuitively, the more time passes, the less accurate the estimated period probably is. The rate of updates applied to the algorithm is limited. This effectively acts as a low pass filter and make the algorithm handle better spikes and short bursts of failures. This is not a novel idea, some alternative TCP control algorithms like Westwood+ do something similar. There is a large downward pressure applied to the increasing interval as it approaches the max limit. This is done by tweaking the additive factor via a step function. In practice this has effect of trying to make it a bit harder for jobs to cross the maximum backoff threshold, as they would be killed and potentially lose intermediate work. Main API functions are: success(Key) -> IntervalInMilliseconds failure(Key) -> IntervalInMilliseconds interval(Key) -> IntervalInMilliseconds Key is any (hashable by phash2) term. Typically would be something like {Method, Url} . The result from the function is the current period value. Caller would then presumably choose to sleep for that amount of time before or after making requests. The current interval can be read with interval(Key) function. Implementation is sharded ETS tables based on the key and there is a periodic timer which cleans unused items. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d0fbb3bb7d0676a0d7c2cd16ef683c07d66a470e in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d0fbb3b ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d0fbb3bb7d0676a0d7c2cd16ef683c07d66a470e in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d0fbb3b ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7134881525de111037045a91db3b65f3bd0d1a8c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=7134881 ]

          Implement replication document processor

          Document processor listens for `_replicator` db document updates, parses those
          changes then tries to add replication jobs to the scheduler.

          Listening for changes happens in `couch_multidb_changes module`. That module is
          generic and is set up to listen to shards with `_replicator` suffix by
          `couch_replicator_db_changes`. Updates are then passed to the document
          processor's `process_change/2` function.

          Document replication ID calculation, which can involve fetching filter code
          from the source DB, and addition to the scheduler, is done in a separate
          worker process: `couch_replicator_doc_processor_worker`.

          Before couch replicator manager did most of this work. There are a few
          improvement over previous implementation:

          • Invalid (malformed) replication documents are immediately failed and will
            not be continuously retried.
          • Replication manager message queue backups is unfortunately a common issue
            in production. This is because processing document updates is a serial
            (blocking) operation. Most of that blocking code was moved to separate worker
            processes.
          • Failing filter fetches have an exponential backoff.
          • Replication documents don't have to be deleted first then re-added in order
            update the replication. Document processor on update will compare new and
            previous replication related document fields and update the replication job
            if those changed. Users can freely update unlrelated (custom) fields in their
            replication docs.
          • In case of filtered replications using custom functions, document processor
            will periodically check if filter code on the source has changed. Filter code
            contents is factored into replication ID calculation. If filter code changes
            replication ID will change as well.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7134881525de111037045a91db3b65f3bd0d1a8c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=7134881 ] Implement replication document processor Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: Invalid (malformed) replication documents are immediately failed and will not be continuously retried. Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. Failing filter fetches have an exponential backoff. Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 983d7cbdd113987c1452c04433309480cbaf2fe2 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=983d7cb ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 983d7cbdd113987c1452c04433309480cbaf2fe2 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=983d7cb ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 89fa4de776098258e5447d6a062dcdb5f5d128b5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=89fa4de ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/jobs` endpoint provides a view of all replications managed by
          the scheduler. This endpoint includes more information on the replication than
          the `_scheduler/docs` endpoint, including the history of state transitions of
          the replication. This part was implemented by Benjamin Bastian.

          The `_scheduler/docs` endpoint provides a view of all replicator docs which
          have been seen by the scheduler. This endpoint includes useful information such
          as the state of the replication and the coordinator node. The implemention of
          `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination,
          HTTP request processing and fabric / rexi setup. The algorithm is roughly
          as follows:

          • http endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 89fa4de776098258e5447d6a062dcdb5f5d128b5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=89fa4de ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: http endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2bf76b066c50748789d7b66748006ae70efacf3f in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2bf76b0 ]

          AIMD based rate limiter implementation

          AIMD: additive increase / multiplicative decrease feedback control algorithm.

          https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

          This is an algorithm which converges on the available channel capacity.
          Each participant doesn't a priori know the capacity and participants don't
          communicate or know about each other (so they don't coordinate to divide
          the capacity among themselves).

          A variation of this is used in TCP congestion control algorithm. This is proven
          to converge, while for example, additive increase / additive decrease or
          multiplicative increase / multiplicative decrease won't.

          A few tweaks were applied to the base control logic:

          • Estimated value is an interval (period) instead of a rate. This is for
            convenience, as users will probably want to know how much to sleep. But,
            rate is just 1000 / interval, so it is easy to transform.
          • There is a hard max limit for estimated period. Mainly as a practical concern
            as connections sleeping too long will timeout and / or jobs will waste time
            sleeping and consume scheduler slots, while others could be running.
          • There is a time decay component used to handle large pauses between updates.
            In case of large update interval, assume (optimistically) some successful
            requests have been made. Intuitively, the more time passes, the less accurate
            the estimated period probably is.
          • The rate of updates applied to the algorithm is limited. This effectively
            acts as a low pass filter and make the algorithm handle better spikes and
            short bursts of failures. This is not a novel idea, some alternative TCP
            control algorithms like Westwood+ do something similar.
          • There is a large downward pressure applied to the increasing interval as it
            approaches the max limit. This is done by tweaking the additive factor via
            a step function. In practice this has effect of trying to make it a bit
            harder for jobs to cross the maximum backoff threshold, as they would be
            killed and potentially lose intermediate work.

          Main API functions are:

          success(Key) -> IntervalInMilliseconds

          failure(Key) -> IntervalInMilliseconds

          interval(Key) -> IntervalInMilliseconds

          Key is any (hashable by phash2) term. Typically would be something like

          {Method, Url}

          . The result from the function is the current period value. Caller
          would then presumably choose to sleep for that amount of time before or after
          making requests. The current interval can be read with interval(Key) function.

          Implementation is sharded ETS tables based on the key and there is a periodic
          timer which cleans unused items.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2bf76b066c50748789d7b66748006ae70efacf3f in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2bf76b0 ] AIMD based rate limiter implementation AIMD: additive increase / multiplicative decrease feedback control algorithm. https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease This is an algorithm which converges on the available channel capacity. Each participant doesn't a priori know the capacity and participants don't communicate or know about each other (so they don't coordinate to divide the capacity among themselves). A variation of this is used in TCP congestion control algorithm. This is proven to converge, while for example, additive increase / additive decrease or multiplicative increase / multiplicative decrease won't. A few tweaks were applied to the base control logic: Estimated value is an interval (period) instead of a rate. This is for convenience, as users will probably want to know how much to sleep. But, rate is just 1000 / interval, so it is easy to transform. There is a hard max limit for estimated period. Mainly as a practical concern as connections sleeping too long will timeout and / or jobs will waste time sleeping and consume scheduler slots, while others could be running. There is a time decay component used to handle large pauses between updates. In case of large update interval, assume (optimistically) some successful requests have been made. Intuitively, the more time passes, the less accurate the estimated period probably is. The rate of updates applied to the algorithm is limited. This effectively acts as a low pass filter and make the algorithm handle better spikes and short bursts of failures. This is not a novel idea, some alternative TCP control algorithms like Westwood+ do something similar. There is a large downward pressure applied to the increasing interval as it approaches the max limit. This is done by tweaking the additive factor via a step function. In practice this has effect of trying to make it a bit harder for jobs to cross the maximum backoff threshold, as they would be killed and potentially lose intermediate work. Main API functions are: success(Key) -> IntervalInMilliseconds failure(Key) -> IntervalInMilliseconds interval(Key) -> IntervalInMilliseconds Key is any (hashable by phash2) term. Typically would be something like {Method, Url} . The result from the function is the current period value. Caller would then presumably choose to sleep for that amount of time before or after making requests. The current interval can be read with interval(Key) function. Implementation is sharded ETS tables based on the key and there is a periodic timer which cleans unused items. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 87f4ca0454909466b068d9c10c11467a3e85d3cb in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=87f4ca0 ]

          Implement replication document processor

          Document processor listens for `_replicator` db document updates, parses those
          changes then tries to add replication jobs to the scheduler.

          Listening for changes happens in `couch_multidb_changes module`. That module is
          generic and is set up to listen to shards with `_replicator` suffix by
          `couch_replicator_db_changes`. Updates are then passed to the document
          processor's `process_change/2` function.

          Document replication ID calculation, which can involve fetching filter code
          from the source DB, and addition to the scheduler, is done in a separate
          worker process: `couch_replicator_doc_processor_worker`.

          Before couch replicator manager did most of this work. There are a few
          improvement over previous implementation:

          • Invalid (malformed) replication documents are immediately failed and will
            not be continuously retried.
          • Replication manager message queue backups is unfortunately a common issue
            in production. This is because processing document updates is a serial
            (blocking) operation. Most of that blocking code was moved to separate worker
            processes.
          • Failing filter fetches have an exponential backoff.
          • Replication documents don't have to be deleted first then re-added in order
            update the replication. Document processor on update will compare new and
            previous replication related document fields and update the replication job
            if those changed. Users can freely update unlrelated (custom) fields in their
            replication docs.
          • In case of filtered replications using custom functions, document processor
            will periodically check if filter code on the source has changed. Filter code
            contents is factored into replication ID calculation. If filter code changes
            replication ID will change as well.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 87f4ca0454909466b068d9c10c11467a3e85d3cb in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=87f4ca0 ] Implement replication document processor Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: Invalid (malformed) replication documents are immediately failed and will not be continuously retried. Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. Failing filter fetches have an exponential backoff. Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a7f240f895c8c9d0a592b4e340b0d7125d3360cb in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=a7f240f ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/jobs` endpoint provides a view of all replications managed by
          the scheduler. This endpoint includes more information on the replication than
          the `_scheduler/docs` endpoint, including the history of state transitions of
          the replication. This part was implemented by Benjamin Bastian.

          The `_scheduler/docs` endpoint provides a view of all replicator docs which
          have been seen by the scheduler. This endpoint includes useful information such
          as the state of the replication and the coordinator node. The implemention of
          `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination,
          HTTP request processing and fabric / rexi setup. The algorithm is roughly
          as follows:

          • http endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit a7f240f895c8c9d0a592b4e340b0d7125d3360cb in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=a7f240f ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: http endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2d8d58bff41f7bfda8ab106e462d108990595615 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2d8d58b ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2d8d58bff41f7bfda8ab106e462d108990595615 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2d8d58b ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 06f9cad4a2fcf7a2a82ef7e155468f1b044ac52c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=06f9cad ]

          Cluster ownership module implementation

          This module maintains cluster membership information for replication and
          provides functions to check ownership of replication jobs.

          A cluster membership change is registered only after a configurable
          `cluster_quiet_period` interval has passed since the last node addition or
          removal. This is useful in cases of rolling node reboots in a cluster in order
          to avoid rescanning for membership changes after every node up and down event,
          and instead doing only on rescan at the very end.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 06f9cad4a2fcf7a2a82ef7e155468f1b044ac52c in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=06f9cad ] Cluster ownership module implementation This module maintains cluster membership information for replication and provides functions to check ownership of replication jobs. A cluster membership change is registered only after a configurable `cluster_quiet_period` interval has passed since the last node addition or removal. This is useful in cases of rolling node reboots in a cluster in order to avoid rescanning for membership changes after every node up and down event, and instead doing only on rescan at the very end. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 041ffa13aeae0cf5e5410d29d461b32a955142bc in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=041ffa1 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 041ffa13aeae0cf5e5410d29d461b32a955142bc in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=041ffa1 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 516692f22b69f79ae6c89454cced364ac58088e5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=516692f ]

          AIMD based rate limiter implementation

          AIMD: additive increase / multiplicative decrease feedback control algorithm.

          https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

          This is an algorithm which converges on the available channel capacity.
          Each participant doesn't a priori know the capacity and participants don't
          communicate or know about each other (so they don't coordinate to divide
          the capacity among themselves).

          A variation of this is used in TCP congestion control algorithm. This is proven
          to converge, while for example, additive increase / additive decrease or
          multiplicative increase / multiplicative decrease won't.

          A few tweaks were applied to the base control logic:

          • Estimated value is an interval (period) instead of a rate. This is for
            convenience, as users will probably want to know how much to sleep. But,
            rate is just 1000 / interval, so it is easy to transform.
          • There is a hard max limit for estimated period. Mainly as a practical concern
            as connections sleeping too long will timeout and / or jobs will waste time
            sleeping and consume scheduler slots, while others could be running.
          • There is a time decay component used to handle large pauses between updates.
            In case of large update interval, assume (optimistically) some successful
            requests have been made. Intuitively, the more time passes, the less accurate
            the estimated period probably is.
          • The rate of updates applied to the algorithm is limited. This effectively
            acts as a low pass filter and make the algorithm handle better spikes and
            short bursts of failures. This is not a novel idea, some alternative TCP
            control algorithms like Westwood+ do something similar.
          • There is a large downward pressure applied to the increasing interval as it
            approaches the max limit. This is done by tweaking the additive factor via
            a step function. In practice this has effect of trying to make it a bit
            harder for jobs to cross the maximum backoff threshold, as they would be
            killed and potentially lose intermediate work.

          Main API functions are:

          success(Key) -> IntervalInMilliseconds

          failure(Key) -> IntervalInMilliseconds

          interval(Key) -> IntervalInMilliseconds

          Key is any (hashable by phash2) term. Typically would be something like

          {Method, Url}

          . The result from the function is the current period value. Caller
          would then presumably choose to sleep for that amount of time before or after
          making requests. The current interval can be read with interval(Key) function.

          Implementation is sharded ETS tables based on the key and there is a periodic
          timer which cleans unused items.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 516692f22b69f79ae6c89454cced364ac58088e5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=516692f ] AIMD based rate limiter implementation AIMD: additive increase / multiplicative decrease feedback control algorithm. https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease This is an algorithm which converges on the available channel capacity. Each participant doesn't a priori know the capacity and participants don't communicate or know about each other (so they don't coordinate to divide the capacity among themselves). A variation of this is used in TCP congestion control algorithm. This is proven to converge, while for example, additive increase / additive decrease or multiplicative increase / multiplicative decrease won't. A few tweaks were applied to the base control logic: Estimated value is an interval (period) instead of a rate. This is for convenience, as users will probably want to know how much to sleep. But, rate is just 1000 / interval, so it is easy to transform. There is a hard max limit for estimated period. Mainly as a practical concern as connections sleeping too long will timeout and / or jobs will waste time sleeping and consume scheduler slots, while others could be running. There is a time decay component used to handle large pauses between updates. In case of large update interval, assume (optimistically) some successful requests have been made. Intuitively, the more time passes, the less accurate the estimated period probably is. The rate of updates applied to the algorithm is limited. This effectively acts as a low pass filter and make the algorithm handle better spikes and short bursts of failures. This is not a novel idea, some alternative TCP control algorithms like Westwood+ do something similar. There is a large downward pressure applied to the increasing interval as it approaches the max limit. This is done by tweaking the additive factor via a step function. In practice this has effect of trying to make it a bit harder for jobs to cross the maximum backoff threshold, as they would be killed and potentially lose intermediate work. Main API functions are: success(Key) -> IntervalInMilliseconds failure(Key) -> IntervalInMilliseconds interval(Key) -> IntervalInMilliseconds Key is any (hashable by phash2) term. Typically would be something like {Method, Url} . The result from the function is the current period value. Caller would then presumably choose to sleep for that amount of time before or after making requests. The current interval can be read with interval(Key) function. Implementation is sharded ETS tables based on the key and there is a periodic timer which cleans unused items. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3a0ea5f6d77113729eaf03082c0957fccaa4dddd in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=3a0ea5f ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3a0ea5f6d77113729eaf03082c0957fccaa4dddd in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=3a0ea5f ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 144452fcd8aa29a8757a4b3042f78d2de6c14d94 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=144452f ]

          Implement replication document processor

          Document processor listens for `_replicator` db document updates, parses those
          changes then tries to add replication jobs to the scheduler.

          Listening for changes happens in `couch_multidb_changes module`. That module is
          generic and is set up to listen to shards with `_replicator` suffix by
          `couch_replicator_db_changes`. Updates are then passed to the document
          processor's `process_change/2` function.

          Document replication ID calculation, which can involve fetching filter code
          from the source DB, and addition to the scheduler, is done in a separate
          worker process: `couch_replicator_doc_processor_worker`.

          Before couch replicator manager did most of this work. There are a few
          improvement over previous implementation:

          • Invalid (malformed) replication documents are immediately failed and will
            not be continuously retried.
          • Replication manager message queue backups is unfortunately a common issue
            in production. This is because processing document updates is a serial
            (blocking) operation. Most of that blocking code was moved to separate worker
            processes.
          • Failing filter fetches have an exponential backoff.
          • Replication documents don't have to be deleted first then re-added in order
            update the replication. Document processor on update will compare new and
            previous replication related document fields and update the replication job
            if those changed. Users can freely update unlrelated (custom) fields in their
            replication docs.
          • In case of filtered replications using custom functions, document processor
            will periodically check if filter code on the source has changed. Filter code
            contents is factored into replication ID calculation. If filter code changes
            replication ID will change as well.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 144452fcd8aa29a8757a4b3042f78d2de6c14d94 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=144452f ] Implement replication document processor Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: Invalid (malformed) replication documents are immediately failed and will not be continuously retried. Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. Failing filter fetches have an exponential backoff. Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5b612222a42f653b9396301bc2802f29dd133436 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5b61222 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5b612222a42f653b9396301bc2802f29dd133436 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5b61222 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5fcb735df9aa84b201d359a914324ee05ccfbb15 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5fcb735 ]

          Add `_scheduler/

          {jobs,docs}

          ` API endpoints

          The `_scheduler/jobs` endpoint provides a view of all replications managed by
          the scheduler. This endpoint includes more information on the replication than
          the `_scheduler/docs` endpoint, including the history of state transitions of
          the replication. This part was implemented by Benjamin Bastian.

          The `_scheduler/docs` endpoint provides a view of all replicator docs which
          have been seen by the scheduler. This endpoint includes useful information such
          as the state of the replication and the coordinator node. The implemention of
          `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination,
          HTTP request processing and fabric / rexi setup. The algorithm is roughly
          as follows:

          • http endpoint:
          • parses query args like it does for any view query
          • parses states to filter by, states are kept in the `extra` query arg
          • Call is made to couch_replicator_fabric. This is equivalent to
            fabric:all_docs. Here the typical fabric / rexi setup is happening.
          • Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is
            similar to fabric_rpc's all_docs handler. However it is a bit more intricate
            to handle both replication document in terminal state as well as those which
            are active.
          • Before emitting it queries the state of the document to see if it is in a
            terminal state. If it is, it filters it and decides if it should be
            emitted or not.
          • If the document state cannot be found from the document. It tries to
            fetch active state from local node's doc processor via key based lookup.
            If it finds, it can also filter it based on state and emit it or skip.
          • If the document cannot be found in the node's local doc processor ETS
            table, the row is emitted with a doc value of `undecided`. This will
            let the coordinator fetch the state by possibly querying other nodes's
            doc processors.
          • Coordinator then starts handling messages. This also mostly mimics all_docs.
            At this point the most interesting thing is handling `undecided` docs. If
            one is found, then `replicator:active_doc/2` is queried. There, all nodes
            where document shards live are queries. This is better than a previous
            implementation where all nodes were queries all the time.
          • The final work happens in `couch_replicator_httpd` where the emitting
            callback is. There we only the doc is emitted (not keys, rows, values).
            Another thing that happens is the `Total` value is decremented to
            account for the always-present _design doc.

          Because of this a bunch of stuff was removed. Including an extra view which
          was build and managed by the previous implementation.

          As a bonus, other view-related parameters such as skip and limit seems to
          work out of the box and don't have to be implemented ad-hoc.

          Also, most importantly many thanks to Paul Davis for suggesting this approach.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5fcb735df9aa84b201d359a914324ee05ccfbb15 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=5fcb735 ] Add `_scheduler/ {jobs,docs} ` API endpoints The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: http endpoint: parses query args like it does for any view query parses states to filter by, states are kept in the `extra` query arg Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1cbeb8cf77d5e8554dbafc6a627daee674759562 in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=1cbeb8c ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1cbeb8cf77d5e8554dbafc6a627daee674759562 in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=1cbeb8c ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 22a7ac737b424a5abec815003f2ec69250a2304b in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=22a7ac7 ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 22a7ac737b424a5abec815003f2ec69250a2304b in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=22a7ac7 ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2f2832ea924ea3d671c7d8fc8919badf8908f657 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2f2832e ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2f2832ea924ea3d671c7d8fc8919badf8908f657 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2f2832e ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d98e7cfed43d5cde66c83feff12dfcd6c30ef232 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d98e7cf ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d98e7cfed43d5cde66c83feff12dfcd6c30ef232 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d98e7cf ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit eff3b397570556d5fc9fcb6a3c7b8bb1d16c7ed5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=eff3b39 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit eff3b397570556d5fc9fcb6a3c7b8bb1d16c7ed5 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=eff3b39 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 90d8fa2ae693beec943f117c3f2dc5c2e32de958 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=90d8fa2 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 90d8fa2ae693beec943f117c3f2dc5c2e32de958 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=90d8fa2 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f2847b3ff4ad41200e90a3faacdfe237573018e6 in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f2847b3 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit f2847b3ff4ad41200e90a3faacdfe237573018e6 in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f2847b3 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cb95722c1bfc9f251b8568b1e94bf330bbdd0d32 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=cb95722 ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit cb95722c1bfc9f251b8568b1e94bf330bbdd0d32 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=cb95722 ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 45877136856666d39bb5ebd2fb9ec5a1554b02dc in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4587713 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 45877136856666d39bb5ebd2fb9ec5a1554b02dc in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4587713 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d6a3e05a3c3de207117eb05d7f45dbc7fcc8fb55 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d6a3e05 ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d6a3e05a3c3de207117eb05d7f45dbc7fcc8fb55 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d6a3e05 ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4c6ee5037f08519a8657554d8f505b7b963fe575 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c6ee50 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4c6ee5037f08519a8657554d8f505b7b963fe575 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c6ee50 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2d79ed577335b14697644b57b979531685c798e2 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=2d79ed5 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2d79ed577335b14697644b57b979531685c798e2 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=2d79ed5 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fac8c37a126e3b2b535467796a34c7a8c21c3890 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fac8c37 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit fac8c37a126e3b2b535467796a34c7a8c21c3890 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fac8c37 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9718b97cb87ba239409d13e72c5f369927322e0e in couchdb's branch refs/heads/63012-scheduler from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9718b97 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9718b97cb87ba239409d13e72c5f369927322e0e in couchdb's branch refs/heads/63012-scheduler from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9718b97 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9895a734f1054eba2fe0cf582574adf28fdec369 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9895a73 ]

          Cluster ownership module implementation

          This module maintains cluster membership information for replication and
          provides functions to check ownership of replication jobs.

          A cluster membership change is registered only after a configurable
          `cluster_quiet_period` interval has passed since the last node addition or
          removal. This is useful in cases of rolling node reboots in a cluster in order
          to avoid rescanning for membership changes after every node up and down event,
          and instead doing only on rescan at the very end.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9895a734f1054eba2fe0cf582574adf28fdec369 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9895a73 ] Cluster ownership module implementation This module maintains cluster membership information for replication and provides functions to check ownership of replication jobs. A cluster membership change is registered only after a configurable `cluster_quiet_period` interval has passed since the last node addition or removal. This is useful in cases of rolling node reboots in a cluster in order to avoid rescanning for membership changes after every node up and down event, and instead doing only on rescan at the very end. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4d2969d9a3e4899554a96d6711bf1d571277766a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4d2969d ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4d2969d9a3e4899554a96d6711bf1d571277766a in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4d2969d ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 25054365a0a198d829a7414f8c0c10e0a5ac6651 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2505436 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 25054365a0a198d829a7414f8c0c10e0a5ac6651 in couchdb's branch refs/heads/63012-scheduler from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2505436 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d89f21bff34a21d7ba296d43b3b0c12021416424 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d89f21b ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d89f21bff34a21d7ba296d43b3b0c12021416424 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d89f21b ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4841774575fb5771a245e5f046a26eaa7ac8dbb4 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4841774 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4841774575fb5771a245e5f046a26eaa7ac8dbb4 in couchdb's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4841774 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9718b97cb87ba239409d13e72c5f369927322e0e in couchdb's branch refs/heads/master from Robert Newson
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9718b97 ]

          Introduce couch_replicator_scheduler

          Scheduling replicator can run a large number of replication jobs by scheduling
          them. It will periodically stop some jobs and start new ones. Jobs that fail
          will be penalized with an exponential backoff.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9718b97cb87ba239409d13e72c5f369927322e0e in couchdb's branch refs/heads/master from Robert Newson [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=9718b97 ] Introduce couch_replicator_scheduler Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4d2969d9a3e4899554a96d6711bf1d571277766a in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4d2969d ]

          Implement multi-db shard change monitoring

          Monitor shards which match a suffix for creation, deletion, and doc updates.

          To use implement `couch_multidb_changes` behavior. Call `start_link` with
          DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior
          callback functions will be called when shards are created, deleted, found and
          updated.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4d2969d9a3e4899554a96d6711bf1d571277766a in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4d2969d ] Implement multi-db shard change monitoring Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 25054365a0a198d829a7414f8c0c10e0a5ac6651 in couchdb's branch refs/heads/master from Benjamin Bastian
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2505436 ]

          Share connections between replications

          This commit adds functionality to share connections between
          replications. This is to solve two problems:

          • Prior to this commit, each replication would create a pool of
            connections and hold onto those connections as long as the replication
            existed. This was wasteful and cause CouchDB to use many unnecessary
            connections.
          • When the pool was being terminated, the pool would block while the
            socket was closed. This would cause the entire replication scheduler
            to block. By reusing connections, connections are never closed by
            clients. They are only ever relinquished. This operation is always
            fast.

          This commit adds an intermediary process which tracks which connection
          processes are being used by which client. It monitors clients and
          connections. If a client or connection crashes, the paired
          client/connection will be terminated.

          A client can gracefully relinquish ownership of a connection. If that
          happens, the connection will be shared with another client. If the
          connection remains idle for too long, it will be closed.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 25054365a0a198d829a7414f8c0c10e0a5ac6651 in couchdb's branch refs/heads/master from Benjamin Bastian [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2505436 ] Share connections between replications This commit adds functionality to share connections between replications. This is to solve two problems: Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d89f21bff34a21d7ba296d43b3b0c12021416424 in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d89f21b ]

          Refactor utils into 3 modules

          Over the years utils accumulated a lot of functionality. Clean up a bit by
          separating it into specific modules according to semantics:

          • couch_replicator_docs : Handle read and writing to replicator dbs.
            It includes updating state fields, parsing options from documents, and
            making sure replication VDU design document is in sync.
          • couch_replicator_filters : Fetch and manipulate replication filters.
          • couch_replicator_ids : Calculate replication IDs. Handles versioning and
            Pretty formatting of IDs. Filtered replications using user filter functions
            incorporate a filter code hash into the calculation, in that case call
            couch_replicator_filters module to fetch the filter from the source.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit d89f21bff34a21d7ba296d43b3b0c12021416424 in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=d89f21b ] Refactor utils into 3 modules Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. couch_replicator_filters : Fetch and manipulate replication filters. couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4841774575fb5771a245e5f046a26eaa7ac8dbb4 in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4841774 ]

          Stitch scheduling replicator together.

          Glue together all the scheduling replicator pieces.

          Scheduler is the main component. It can run a large number of replication jobs
          by switching between them, stopping and starting some periodically. Jobs
          which fail are backed off exponentially. Normal (non-continuous) jobs will be
          allowed to run to completion to preserve their current semantics.

          Scheduler behavior can configured by these configuration options in
          `[replicator]` sections:

          • `max_jobs` : Number of actively running replications. Making this too high
            could cause performance issues. Making it too low could mean replications jobs
            might not have enough time to make progress before getting unscheduled again.
            This parameter can be adjusted at runtime and will take effect during next
            reschudling cycle.
          • `interval` : Scheduling interval in milliseconds. During each reschedule
            cycle scheduler might start or stop up to "max_churn" number of jobs.
          • `max_churn` : Maximum number of replications to start and stop during
            rescheduling. This parameter along with "interval" defines the rate of job
            replacement. During startup, however a much larger number of jobs could be
            started (up to max_jobs) in short period of time.

          Replication jobs are added to the scheduler by the document processor or from
          the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
          endpoint handler.

          Document processor listens for updates via couch_mutlidb_changes module then
          tries to add replication jobs to the scheduler. Sometimes translating a
          document update to a replication job could fail, either permantly (if document
          is malformed and missing some expected fields for example) or temporarily if
          it is a filtered replication and filter cannot be fetched. A failed filter
          fetch will be retried with an exponential backoff.

          couch_replicator_clustering is in charge of monitoring cluster membership
          changes. When membership changes, after a configurable quiet period, a rescan
          will be initiated. Rescan will shufle replication jobs to make sure a
          replication job is running on only one node.

          A new set of stats were added to introspect scheduler and doc processor
          internals.

          The top replication supervisor structure is `rest_for_one`. This means if
          a child crashes, all children to the "right" of it will be restarted (if
          visualized supervisor hierarchy as an upside-down tree). Clustering,
          connection pool and rate limiter are towards the "left" as they are more
          fundamental, if clustering child crashes, most other components will be
          restart. Doc process or and multi-db changes children are towards the "right".
          If they crash, they can be safely restarted without affecting already running
          replication or components like clustering or connection pool.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4841774575fb5771a245e5f046a26eaa7ac8dbb4 in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4841774 ] Stitch scheduling replicator together. Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in ` [replicator] ` sections: `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 27d1223cfbe6740b22a426d98344d806c5314ed2 in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=27d1223 ]

          Disabling replication startup jitter in Windows makefile

          A similar change has already been made to *nix Makefile already.

          This is for the Javascript integration test suite to not timeout
          when running in Travis. We don't run Windows tests in Travis but
          this should speed things a bit, and it's nice to keep both in
          sync.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 27d1223cfbe6740b22a426d98344d806c5314ed2 in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=27d1223 ] Disabling replication startup jitter in Windows makefile A similar change has already been made to *nix Makefile already. This is for the Javascript integration test suite to not timeout when running in Travis. We don't run Windows tests in Travis but this should speed things a bit, and it's nice to keep both in sync. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 365f40906c373f2ba7920bafa071231da7ba9a07 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=365f409 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 365f40906c373f2ba7920bafa071231da7ba9a07 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=365f409 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 849981ce859bd085b04aa713652445408f4b4060 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=849981c ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 849981ce859bd085b04aa713652445408f4b4060 in couchdb-documentation's branch refs/heads/63012-scheduler from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=849981c ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7705a56c9b413ca9b1fcdcdc27e0ca0e96cba05b in couchdb-documentation's branch refs/heads/master from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=7705a56 ]

          Documention for the scheduling replicator

          "What's New" update includes feature improvements and also an implicit to 3.0

          "Replicator" section describing some new behavior and includes a state transition diagram.

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7705a56c9b413ca9b1fcdcdc27e0ca0e96cba05b in couchdb-documentation's branch refs/heads/master from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb-documentation.git;h=7705a56 ] Documention for the scheduling replicator "What's New" update includes feature improvements and also an implicit to 3.0 "Replicator" section describing some new behavior and includes a state transition diagram. Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f9e2e5a4b8e986343695082a3ccc4a470446e2bb in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f9e2e5a ]

          Fix `badarg` when querying replicator's _scheduler/docs endpoint

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit f9e2e5a4b8e986343695082a3ccc4a470446e2bb in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f9e2e5a ] Fix `badarg` when querying replicator's _scheduler/docs endpoint Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 63278f25733ef7b08d84060192b9e67badec8072 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-kv-nodes from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=63278f2 ]

          Bump docs to include scheduling replicator documentation

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit 63278f25733ef7b08d84060192b9e67badec8072 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-kv-nodes from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=63278f2 ] Bump docs to include scheduling replicator documentation Jira: COUCHDB-3324
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f9e2e5a4b8e986343695082a3ccc4a470446e2bb in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-kv-nodes from Nick Vatamaniuc
          [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f9e2e5a ]

          Fix `badarg` when querying replicator's _scheduler/docs endpoint

          Jira: COUCHDB-3324

          Show
          jira-bot ASF subversion and git services added a comment - Commit f9e2e5a4b8e986343695082a3ccc4a470446e2bb in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-kv-nodes from Nick Vatamaniuc [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=f9e2e5a ] Fix `badarg` when querying replicator's _scheduler/docs endpoint Jira: COUCHDB-3324

            People

            • Assignee:
              Unassigned
              Reporter:
              vatamane Nick Vatamaniuc
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development