[COUCHDB-1080] fail fast with checkpoint conflicts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.2
Fix Version/s: 1.2
Component/s: Replication
Labels:
None

Skill Level:
Regular Contributors Level (Easy to Medium)

Description

I've thought about this long and hard and probably should have submitted the bug a long time ago. I've also run this in production for months.
When a checkpoint conflict occurs it is almost always the right thing to do to abort.

If there is a rev mismatch it could mean there's are two conflicting (continuous and one-shot) replications between the same hosts running. Without reloading the history documents checkpoints will continue to fail forever. This could leave us in a state with many replicated changes but no checkpoints.
Similarly, a successful checkpoint but a lost/timed-out response could cause this situation.

Since the supervisor will restart the replication anyway, I think it's safer to abort and retry.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

COUCHDB-1080-4-fdmanana.patch
04/Mar/11 15:18
9 kB
Filipe David Borba Manana
COUCHDB-1080-3-fdmanana.patch
03/Mar/11 19:27
7 kB
Filipe David Borba Manana
COUCHDB-1080-2-fdmanana.patch
03/Mar/11 17:17
5 kB
Filipe David Borba Manana
COUCHDB-1080-fdmanana.patch
03/Mar/11 14:20
4 kB
Filipe David Borba Manana
paranoid_checkpoint_failure_v2.patch
03/Mar/11 03:07
2 kB
Randall Leeds
paranoid_checkpoint_failure.patch
02/Mar/11 06:26
2 kB
Randall Leeds

Activity

People

Assignee:: Filipe David Borba Manana

Reporter:: Randall Leeds

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 02/Mar/11 06:23

Updated:: 07/Mar/11 21:28

Resolved:: 05/Mar/11 11:21