[LUCENE-6779] Reduce memory allocated by CompressingStoredFieldsWriter to write large strings - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.4, 6.0
Component/s: core/codecs
Labels:
None

Lucene Fields:

New

Description

In SOLR-7927, I am trying to reduce the memory required to index very large documents (between 10 to 100MB) and one of the places which allocate a lot of heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem existed in JavaBinCodec and we reduced its memory allocation by falling back to a double pass approach in ~~SOLR-7971~~ when the utf8 size of the string is greater than 64KB.

I propose to make the same changes to CompressingStoredFieldsWriter as we made to JavaBinCodec in ~~SOLR-7971~~.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6779.patch
03/Sep/15 20:21
8 kB
Shalin Shekhar Mangar
LUCENE-6779_alt.patch
04/Sep/15 00:44
6 kB
Robert Muir
LUCENE-6779.patch
14/Sep/15 13:48
8 kB
Shalin Shekhar Mangar
LUCENE-6779.patch
14/Sep/15 14:06
9 kB
Shalin Shekhar Mangar
LUCENE-6779.patch
15/Sep/15 13:45
13 kB
Shalin Shekhar Mangar

Issue Links

is required by

SOLR-7927 Indexing large documents requires larger heap than may be necessary

Open

Activity

People

Assignee:: Shalin Shekhar Mangar

Reporter:: Shalin Shekhar Mangar

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Sep/15 20:20

Updated:: 28/Aug/22 14:42

Resolved:: 15/Sep/15 15:37