Affects Version/s: 6.2, master (9.0), 8.3
Fix Version/s: 8.5
We hit an exception with 8.3 that someone else also hit on stackoverflow:
I recently converted a solr 7.x + zookeeper 3.4.14 to solr 8.3 + zk 3.5.6, and depending on how I start the solr nodes I'm geting a sync exception.
My setup uses 3 zk nodes and 2 solr nodes (let's call it A and B). The collection that has this problem has 1 shard and 2 replicas. I've noticed 2 situations: (1) which works fine and (2) which does not work.
1) This works: I start solr node A, and wait until it's replica is elected leader ("green" in the Solr interface 'Cloud'->'Graph') - which takes about 2 min; and only then start solr node B. Both replicas are active and the one in A is the leader.
2) This does NOT work: I start solr node A, and a few secs after I star solr node B (that is, before the 'A' replica is elected leader - still "Down" in the solr interface). In this case I get the following exception:
ERROR (coreZkRegister-1-thread-2-processing-n:192.168.15.20:8986_solr x:alldata_shard1_replica_n1 c:alldata s:shard1 r:core_node3) [c:alldata s:shard1 r:core_node3 x:alldata_shard1_replica_n1] o.a.s.c.SyncStrategy Sync Failed:java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 99
It seems that if both solr node are started soon after each other, then ZK cannot elect one as leader. This error only appears in the solr.log of node A, even if I invert the order of starting nodes.