Resolution: Cannot Reproduce
Affects Version/s: 3.6.2, 4.0
Fix Version/s: None
ManifoldCF, Solr connector, SolrJ, and Solr 4.0 or 3.6 on Mac OSX or Ubuntu, all localhost connections
In ManifoldCF, we've been seeing problems with SolrJ connections throwing java.net.SocketException's. See
CONNECTORS-616 for details as to exactly what varieties of this exception are thrown, but "broken pipe" is the most common. This occurs on multiple Unix variants as stated. (We also occasionally see exceptions on Windows, but they are much less frequent and are different variants than on Unix.)
The exceptions seem to occur during the time an initial connection is getting established, and seems to occur randomly when multiple connections are getting established all at the same time. Wire logging shows that only the first few headers are sent before the connection is broken. Solr itself does not log any error. A retry is usually sufficient to have the transaction succeed.
The Solr Connector in ManifoldCF has recently been upgraded to rely on SolrJ, which could be a complicating factor. However, I have repeatedly audited both the Solr Connection code and the SolrJ code for best practices, and while I found a couple of problems, nothing seems to be of the sort that could cause a broken pipe. For that to happen, the socket must be closed either on the client end or on the server end, and there appears to be no mechanism for that happening on the client end, since multiple threads would have to be working with the same socket for that to be a possibility.
It is also true that in ManifoldCF we disable the automatic retries that are normally enabled for HttpComponents HttpClient. These automatic retries likely mask this problem should it be occurring in other situations.
Places where there could potentially be a bug, in order of likelihood:
(1) Jetty. Nobody I am aware of has seen this on Tomcat yet. But I also don't know if anyone has tried it.
(2) Solr servlet. If it is possible for a servlet implementation to cause the connection to drop without any exception being generated, this would be something that should be researched.
(3) HttpComponents/HttpClient. If there is a client-side issue, it would have to be because an httpclient instance was closing sockets from other instances.