Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-1080

fail fast with checkpoint conflicts

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.2
    • 1.2
    • Replication
    • None
    • Regular Contributors Level (Easy to Medium)

    Description

      I've thought about this long and hard and probably should have submitted the bug a long time ago. I've also run this in production for months.
      When a checkpoint conflict occurs it is almost always the right thing to do to abort.

      If there is a rev mismatch it could mean there's are two conflicting (continuous and one-shot) replications between the same hosts running. Without reloading the history documents checkpoints will continue to fail forever. This could leave us in a state with many replicated changes but no checkpoints.
      Similarly, a successful checkpoint but a lost/timed-out response could cause this situation.

      Since the supervisor will restart the replication anyway, I think it's safer to abort and retry.

      Attachments

        1. COUCHDB-1080-4-fdmanana.patch
          9 kB
          Filipe David Borba Manana
        2. COUCHDB-1080-3-fdmanana.patch
          7 kB
          Filipe David Borba Manana
        3. COUCHDB-1080-2-fdmanana.patch
          5 kB
          Filipe David Borba Manana
        4. COUCHDB-1080-fdmanana.patch
          4 kB
          Filipe David Borba Manana
        5. paranoid_checkpoint_failure_v2.patch
          2 kB
          Randall Leeds
        6. paranoid_checkpoint_failure.patch
          2 kB
          Randall Leeds

        Activity

          People

            fdmanana Filipe David Borba Manana
            tilgovi Randall Leeds
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: