Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Hi all,
I have an issue at the moment that appears to have followed me from v1.2.1 with erlang R14, through to an upgrade to v1.4.0 with R16B01.
I have 20 "remote" nodes, and one "central" node; and each of the remote instances are configured with Bi-Direction replication (ie. no replication defined on the Central node directly). Single main database of ~600,000 documents at ~11GB in size.
On the remote nodes, and more frequently the Central node, I get huge (3000+ lines) errors in the logs- seemingly intermittently; I'm yet to track down the root cause here. Open file handles and ERL_MAX_PORTS are set to values upwards of 16k.
Other stats:
$ sudo su - couchdb -c "lsof | grep -c ." 1511 $ sudo netstat -npla | grep "ESTAB" | grep -c . 310 $ ps -ef | grep -c "^couchdb" 19
An example log from a Remote node is: http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01.20140218.log
An example log from the Central node is: http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01_central.20140218.log
The main error line is "{error,{error,req_timedout}}}}" for either "_bulk_docs" on remote nodes, or "_revs_diff" on the central node it would seem.