Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-3271

Replications crash with 'kaboom' exit

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In a few cases it was observer that replications were crashing with `kaboom` exit. This happens here:

      https://github.com/apache/couchdb-couch-replicator/blob/cb41bacb2a06613649df46d62249afebda42b8c0/src/couch_replicator_api_wrap.erl#L236

      this is during an open_revs call one of the docs. So change feed found it but then could not get its revisions.

      The reason is open_revs get request returns an empty result when more than one nodes are in maintenance mode.

        Activity

        Show
        vatamane Nick Vatamaniuc added a comment - https://github.com/apache/couchdb-fabric/pull/84
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6fa8d3ca004fb8fa658bc30abb0705e88fd810ab in couchdb-fabric's branch refs/heads/master from Nick Vatamaniuc
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=6fa8d3c ]

        Fix open_revs fabric eunit test

        In check_workers_error_skipped last worker should be w3 not w2.

        COUCHDB-3271

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6fa8d3ca004fb8fa658bc30abb0705e88fd810ab in couchdb-fabric's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=6fa8d3c ] Fix open_revs fabric eunit test In check_workers_error_skipped last worker should be w3 not w2. COUCHDB-3271
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ec2235196d7195afab59cedc2d61a02b11596ab4 in couchdb-fabric's branch refs/heads/master from Nick Vatamaniuc
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=ec22351 ]

        In open_revs, do not count errors in quorum threshold calculation

        Previously quorum check looked just at the number of replies and decided quorum
        was met even if it only received errors. For example, if 2 nodes are in
        maintance mode it might receive this sequence of replies:

        `rexi_EXIT, rexi_EXIT, ok`

        In that case after the first two it would decide quorum (r=2) was met, return
        what it had so far ([]) and kill the remaining worker, who was about to return
        a valid revision.

        The fix is to keep track of error replies and subtract them when deciding if
        quorum was met.

        COUCHDB-3271

        Show
        jira-bot ASF subversion and git services added a comment - Commit ec2235196d7195afab59cedc2d61a02b11596ab4 in couchdb-fabric's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=ec22351 ] In open_revs, do not count errors in quorum threshold calculation Previously quorum check looked just at the number of replies and decided quorum was met even if it only received errors. For example, if 2 nodes are in maintance mode it might receive this sequence of replies: `rexi_EXIT, rexi_EXIT, ok` In that case after the first two it would decide quorum (r=2) was met, return what it had so far ([]) and kill the remaining worker, who was about to return a valid revision. The fix is to keep track of error replies and subtract them when deciding if quorum was met. COUCHDB-3271
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit d90326bd2dea5b3941843c0dac77fa2ee698e993 in couchdb's branch refs/heads/master from Nick Vatamaniuc
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=d90326b ]

        Bump fabric dependency for open_revs quorum fix

        COUCHDB-3271

        Show
        jira-bot ASF subversion and git services added a comment - Commit d90326bd2dea5b3941843c0dac77fa2ee698e993 in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=d90326b ] Bump fabric dependency for open_revs quorum fix COUCHDB-3271
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6fa8d3ca004fb8fa658bc30abb0705e88fd810ab in couchdb-fabric's branch refs/heads/2971-count-distinct from Nick Vatamaniuc
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=6fa8d3c ]

        Fix open_revs fabric eunit test

        In check_workers_error_skipped last worker should be w3 not w2.

        COUCHDB-3271

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6fa8d3ca004fb8fa658bc30abb0705e88fd810ab in couchdb-fabric's branch refs/heads/2971-count-distinct from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=6fa8d3c ] Fix open_revs fabric eunit test In check_workers_error_skipped last worker should be w3 not w2. COUCHDB-3271
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ec2235196d7195afab59cedc2d61a02b11596ab4 in couchdb-fabric's branch refs/heads/2971-count-distinct from Nick Vatamaniuc
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=ec22351 ]

        In open_revs, do not count errors in quorum threshold calculation

        Previously quorum check looked just at the number of replies and decided quorum
        was met even if it only received errors. For example, if 2 nodes are in
        maintance mode it might receive this sequence of replies:

        `rexi_EXIT, rexi_EXIT, ok`

        In that case after the first two it would decide quorum (r=2) was met, return
        what it had so far ([]) and kill the remaining worker, who was about to return
        a valid revision.

        The fix is to keep track of error replies and subtract them when deciding if
        quorum was met.

        COUCHDB-3271

        Show
        jira-bot ASF subversion and git services added a comment - Commit ec2235196d7195afab59cedc2d61a02b11596ab4 in couchdb-fabric's branch refs/heads/2971-count-distinct from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=ec22351 ] In open_revs, do not count errors in quorum threshold calculation Previously quorum check looked just at the number of replies and decided quorum was met even if it only received errors. For example, if 2 nodes are in maintance mode it might receive this sequence of replies: `rexi_EXIT, rexi_EXIT, ok` In that case after the first two it would decide quorum (r=2) was met, return what it had so far ([]) and kill the remaining worker, who was about to return a valid revision. The fix is to keep track of error replies and subtract them when deciding if quorum was met. COUCHDB-3271

          People

          • Assignee:
            Unassigned
            Reporter:
            vatamane Nick Vatamaniuc
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development