[COUCHDB-416] Replicating shards into a single aggregation node may cause endless respawning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.9
Fix Version/s: 0.10
Component/s: Database Core
Labels:
None
Environment:

couchdb 0.9.0.r766883 CentOS x86_64

Description

I have a set of CouchDB instances, each one acting as a shard for a large set of data.
Ocassionally, we replicate each instances' database into a different CouchDB instance. We always "pull" replicate (see image attached)
When we do this, we often see errors like this on the target instance:

[Thu, 16 Jul 2009 13:52:32 GMT] [error] [emulator] Error in process <0.29787.102> with exit value:
Unknown macro: {function_clause,[{lists,map,[#Fun<couch_rep.6.75683565>,undefined]},{couch_rep,enum_docs_since,4}]}
*
*
*
* [Thu, 16 Jul 2009 13:52:32 GMT] [error] [<0.7456.6>] replication enumerator exited with {function_clause,
* [{lists,map, * [#Fun<couch_rep.6.75683565>,undefined]},
* {couch_rep,enum_docs_since,4}]}

.. respawning

Once this starts, it is fatal to the CouchDB instance. It logs these messages at over 1000 per second (log level = severe) and chews up HDD.

No errors (other than a HTTP timeout) are seen.

After a database had gone "respawning", the target node was shutdown, logs cleared, target node restarted. Log was tailed - all was quiet. Once a single replication was called again against this database it again immediatly went into respawning hell. There were no stacked replications in this case.

From this it seems that - if a database ever goes into "respawning" it cannot recover (when your enviroment/setup requires replication to occur always).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Picture 2.png
16/Jul/09 14:57
49 kB
Enda Farrell

Activity

People

Assignee:: Adam Kocoloski

Reporter:: Enda Farrell

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 16/Jul/09 14:55

Updated:: 22/Feb/12 06:09

Resolved:: 15/Aug/09 05:15