[SOLR-10375] Stored text retrieved via StoredFieldVisitor on doc in the document cache over-estimates needed byte[] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Java 1.8.121, Linux x64

Description

Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) (as can happen with the UnifiedHighlighter in particular)

If the document cache has the document, will call visitFromCached, will get an out of memory error because of line 752 of SolrIndexSearcher - visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));

 at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B (StringCoding.java:350)
  at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
  at org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V (SolrIndexSearcher.java:685)
  at org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V (SolrIndexSearcher.java:652)

This is due to the current String.getBytes(Charset) implementation, which allocates the underlying byte array as a function of charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3. 3 * 716MB is over Integer.MAX, and the JVM cannot allocate over this, so an out of memory exception is thrown because the allocation of this much memory for a single array is currently impossible.

The problem is not present when the document cache is disabled.

Attachments

Issue Links

is part of

SOLR-10117 Big docs and the DocumentCache; umbrella issue

Open

Activity

People

Assignee:: Unassigned

Reporter:: Michael Braun

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Mar/17 18:45

Updated:: 08/Jun/19 15:28