Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Invalid
-
6.1
-
None
-
The following is the usage on each of the Solr Nodes:
Tasks: 254 total, 1 running, 252 sleeping, 0 stopped, 1 zombie
%Cpu(s): 0.4 us, 0.3 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 20392276 total, 4169296 free, 2917012 used, 13305968 buff/cache
KiB Swap: 5111804 total, 5111636 free, 168 used. 16058184 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21250 solr 20 0 23.599g 1.184g 228440 S 2.0 6.1 59:55.91 javaSolr is running on 5 machines with similar configuration:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Stepping: 4
CPU MHz: 2799.033
BogoMIPS: 5600.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-3The following is the usage on each of the Solr Nodes: Tasks: 254 total, 1 running, 252 sleeping, 0 stopped, 1 zombie %Cpu(s): 0.4 us, 0.3 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 20392276 total, 4169296 free, 2917012 used, 13305968 buff/cache KiB Swap: 5111804 total, 5111636 free, 168 used. 16058184 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21250 solr 20 0 23.599g 1.184g 228440 S 2.0 6.1 59:55.91 java Solr is running on 5 machines with similar configuration: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz Stepping: 4 CPU MHz: 2799.033 BogoMIPS: 5600.00 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-3
Description
We host a Solr Cloud of 5 Nodes for Solr Instances and 3 Zookeeper nodes to maintain the cloud. We have over 70 million docs spread across 13 collections with 40K more documents being added every day almost near time within spans of 5 to 6 minutes.
The System was working as expected and as required for th elast 7 months until suddenly we saw the following exception and all of our instances went offline. We restarted the instances and the cloud ran smoothly for three days before it came crashing down again.
Exception It gives before it goes down is as follows:
3542285 ERROR (OverseerCollectionConfigSetProcessor-98221003671470081-prod-solr-node01:9080_solr-n_0000000106) [ ] o.a.s.c.OverseerTaskProcessor
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer_elect/leader
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:348)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:345)
at org.apache.solr.cloud.OverseerTaskProcessor.amILeader(OverseerTaskProcessor.java:384)
at org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:191)
at java.lang.Thread.run(Unknown Source)