Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
Java 1.8.121, Linux x64
Description
Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) (as can happen with the UnifiedHighlighter in particular)
If the document cache has the document, will call visitFromCached, will get an out of memory error because of line 752 of SolrIndexSearcher - visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));
at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B (StringCoding.java:350) at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941) at org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V (SolrIndexSearcher.java:685) at org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V (SolrIndexSearcher.java:652)
This is due to the current String.getBytes(Charset) implementation, which allocates the underlying byte array as a function of charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3. 3 * 716MB is over Integer.MAX, and the JVM cannot allocate over this, so an out of memory exception is thrown because the allocation of this much memory for a single array is currently impossible.
The problem is not present when the document cache is disabled.
Attachments
Issue Links
- is part of
-
SOLR-10117 Big docs and the DocumentCache; umbrella issue
- Open