We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits).
- Solr 4.3.1 w/
- Jetty 9, Java 1.7.
- 3 solr instances, 1 per physical server.
- 1 collection.
- 3 shards.
- 2 replicas (each instance is a leader and a replica).
- Soft autoCommit is 1000ms.
- Hard autoCommit is 15000ms.
After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with "Connection Refused" exceptions, and otherwise no obviously-useful logs that I could see.
I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch.
Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC
Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9
Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script "normalizes" the ERROR-severity stack traces and returns them in order of occurrence.
Summary of my solr.log: http://pastebin.com/pBdMAWeb