Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.2
    • Fix Version/s: 1.2
    • Component/s: Replication
    • Labels:
      None
    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      I've thought about this long and hard and probably should have submitted the bug a long time ago. I've also run this in production for months.
      When a checkpoint conflict occurs it is almost always the right thing to do to abort.

      If there is a rev mismatch it could mean there's are two conflicting (continuous and one-shot) replications between the same hosts running. Without reloading the history documents checkpoints will continue to fail forever. This could leave us in a state with many replicated changes but no checkpoints.
      Similarly, a successful checkpoint but a lost/timed-out response could cause this situation.

      Since the supervisor will restart the replication anyway, I think it's safer to abort and retry.

      1. paranoid_checkpoint_failure.patch
        2 kB
        Randall Leeds
      2. paranoid_checkpoint_failure_v2.patch
        2 kB
        Randall Leeds
      3. COUCHDB-1080-fdmanana.patch
        4 kB
        Filipe Manana
      4. COUCHDB-1080-2-fdmanana.patch
        5 kB
        Filipe Manana
      5. COUCHDB-1080-3-fdmanana.patch
        7 kB
        Filipe Manana
      6. COUCHDB-1080-4-fdmanana.patch
        9 kB
        Filipe Manana

        Activity

          People

          • Assignee:
            Filipe Manana
            Reporter:
            Randall Leeds
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development