Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Hi,
I've recently went to production with a CouchDB 2 instance, however I'm experiencing some severe issues that started appearing when I had an increase in usage that are causing my database to slow down, stop indexing / returning my views, and ultimately crash because it is consuming too much memory (Usually when the instance is fresh started it runs at ~300MBs RAM, and after a few hours it jumps to almost 3GB RAM usage, which is when it crashes because the system runs OOM).
I'm running my CouchDB instance on a kubernetes cluster hosted on Google Cloud, and I'm using klaemo/couchdb2 's latest 2.0.0 image as a base Docker image.
This instance currently has 792 active, continuous replications running, which I assume might be what is causing this slow down, since I tried disabling them and, after a reboot, the database appeared to be running fine without the replications.
When I consult the logs, I get a lot of these errors messages which I' assuming might be the culprit:
```
[error] 2017-03-23T09:55:06.804700Z nonode@nohost <0.14942.246> 6388d61064 rexi_server exit:{timeout,{gen_server,call,[couch_server,{open,<<"shards/00000000-1fffffff/db_name.1478875836">>,[
,{user_ctx,{user_ctx,<<"replications">>,[<<"services_replicator">>,<<"b
udgets_replicator">>,<<"tasks_replicator">>,<<"comments_replicator">>],<<"default">>}}]},100]}} [{gen_server,call,3,[
{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,248}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[error] 2017-03-23T09:55:06.834468Z nonode@nohost <0.28090.247> 6aa0e19144 rexi_server exit:{timeout,{gen_server,call,[couch_server,{open,<<"shards/20000000-3fffffff/db_name">>,[{timeout,200},{user_ctx,{user_ctx,<<"replications">>,[<<"services_repl
icator">>,<<"budgets_replicator">>,<<"tasks_replicator">>,<<"comments_replicator">>],<<"default">>}}]},200]}} [{gen_server,call,3,[{file,"gen_server.erl"}
,
{line,190}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,86}]},{couch_db,open,2,[{file,"src/couch_db.erl"},{line,91}]},{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,248}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[error] 2017-03-23T09:55:06.834516Z nonode@nohost <0.1093.212> 1c317bf387 rexi_server exit:{timeout,{gen_server,call,[couch_server,{open,<<"shards/20000000-3fffffff/db_name">>,[{timeout,200},{user_ctx,{user_ctx,<<"replications">>,[<<"services_repli
cator">>,<<"budgets_replicator">>,<<"tasks_replicator">>,<<"comments_replicator">>],<<"default">>}}]},200]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}
]},{couch_server,open,2,[
{file,"src/couch_server.erl"},
{line,86}]},{couch_db,open,2,[
{file,"src/couch_db.erl"},
{line,91}]},{fabric_rpc,open_shard,2,[
{file,"src/fabric_rpc.erl"},
{line,248}]},{rexi_server,init_p,3,[
{file,"src/rexi_server.erl"},
{line,139}]}]
```
Could I have some help with this issue? I'm not sure if this might be an actual memory leak, or if, for the number of replications I currently have, I should be expected to actually have a node with more RAM in order to process all the live replications.