[SOLR-7971] Reduce memory allocated by JavaBinCodec to encode large strings - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.4, 6.0
Component/s: Response Writers, SolrCloud
Labels:
None

Description

As discussed in SOLR-7927, we can reduce the buffer memory allocated by JavaBinCodec while writing large strings.

https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700420#comment-14700420

The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF (http://www.unicode.org/glossary/#code_point). This is encoded in UTF-16 as surrogate pair \uDBFF\uDFFF, which takes up two Java chars, and is represented in UTF-8 as the 4-byte sequence F4 8F BF BF. This is likely where the mistaken 4-bytes-per-Java-char formulation came from: the maximum number of UTF-8 bytes required to represent a Unicode code point is 4.

The maximum Java char is \uFFFF, which is represented in UTF-8 as the 3-byte sequence EF BF BF.

So I think it's safe to switch to using 3 bytes per Java char (the unit of measurement returned by String.length()), like CompressingStoredFieldsWriter.writeField() does.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-7971.patch
25/Aug/15 16:29
1 kB
Shalin Shekhar Mangar
SOLR-7971-directbuffer.patch
26/Aug/15 11:11
3 kB
Shalin Shekhar Mangar
SOLR-7971-directbuffer.patch
26/Aug/15 16:34
4 kB
Shalin Shekhar Mangar
SOLR-7971-directbuffer.patch
27/Aug/15 08:00
4 kB
Shalin Shekhar Mangar
SOLR-7971-doublepass.patch
31/Aug/15 15:02
4 kB
Shalin Shekhar Mangar
SOLR-7971-doublepass.patch
31/Aug/15 19:08
4 kB
Noble Paul
SOLR-7971-doublepass.patch
03/Sep/15 15:59
5 kB
Shalin Shekhar Mangar

Activity

People

Assignee:: Shalin Shekhar Mangar

Reporter:: Shalin Shekhar Mangar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 25/Aug/15 16:25

Updated:: 09/May/16 18:49

Resolved:: 03/Sep/15 20:53