Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
7.4, 8.4.1, 9.0
-
None
-
None
Description
When SolrJ clients enable Kerberos authentication, a request interceptor is set up which wraps the actual HttpEntity in a BufferedHttpEntity. This BufferedHttpEntity, well, buffers the request body in a byte[] so it can be repeated if needed. This works fine for small requests, but when requests get large storing the entire request in memory causes contention or OutOfMemoryErrors.
The easiest way for this to manifest is to use ConcurrentUpdateSolrClient, which opens a connection to Solr and streams documents out in an ever increasing request entity until the doc queue held by the client is emptied.
I ran into this while troubleshooting a DIH run that would reproducibly load a few hundred thousand documents before progress stalled out. Solr never crashed and the DIH thread was still alive, but the ConcurrentUpdateSolrClient used by DIH had its "Runner" thread disappear around the time of the stall and an OOM like the one below could be seen in solr-8983-console.log:
WARNING: Uncaught exception in thread: Thread[concurrentUpdateScheduler-28-thread-1,5,TGRP-TestKerberosClientBuffering] java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([371A00FBA76D31DF]:0) at java.base/java.util.Arrays.copyOf(Arrays.java:3745) at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120) at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95) at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:213) at org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:94) at org.apache.solr.common.util.ByteUtils.writeUTF16toUTF8(ByteUtils.java:145) at org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:848) at org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:932) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:328) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228) at org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:616) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228) at org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:764) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228) at org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:705) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:367) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228) at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:223) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:330) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:155) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:91) at org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner$1.writeTo(ConcurrentUpdateSolrClient.java:264) at org.apache.http.entity.EntityTemplate.writeTo(EntityTemplate.java:73) at org.apache.http.entity.BufferedHttpEntity.<init>(BufferedHttpEntity.java:62) at org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder.lambda$new$3(Krb5HttpClientBuilder.java:155) at org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder$$Lambda$459/0x0000000800623840.process(Unknown Source) at org.apache.solr.client.solrj.impl.HttpClientUtil$DynamicInterceptor$1.accept(HttpClientUtil.java:177)
We took heap dumps and were able to confirm that the entire 8gb heap was taken up with a single massive CUSC request body that was being buffered!
(As an aside, I had no idea that OutOfMemoryError's could happen without killing the entire JVM. But apparently they can. CUSC.Runner propagates the OOM as it should and the OOM kills the Runner thread. Since that thread is the gc-root for the massive BufferedHttpEntity though, a garbage collection frees up most of the heap space and the JVM survives its memory trouble. Solr's oom script never triggers.)
I've attached a JUnit test which reproduces the OOM issue by using a "fake" Kerberos config.
Attachments
Attachments
Issue Links
- relates to
-
SOLR-13270 SolrJ does not send "Expect: 100-continue" header
- Open
-
SOLR-14250 Solr tries to read request body after error response is sent
- Closed