Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7927 Indexing large documents requires larger heap than may be necessary
  3. SOLR-7971

Reduce memory allocated by JavaBinCodec to encode large strings

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 5.4, 6.0
    • Response Writers, SolrCloud
    • None

    Description

      As discussed in SOLR-7927, we can reduce the buffer memory allocated by JavaBinCodec while writing large strings.

      https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700420#comment-14700420

      The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF (http://www.unicode.org/glossary/#code_point). This is encoded in UTF-16 as surrogate pair \uDBFF\uDFFF, which takes up two Java chars, and is represented in UTF-8 as the 4-byte sequence F4 8F BF BF. This is likely where the mistaken 4-bytes-per-Java-char formulation came from: the maximum number of UTF-8 bytes required to represent a Unicode code point is 4.

      The maximum Java char is \uFFFF, which is represented in UTF-8 as the 3-byte sequence EF BF BF.

      So I think it's safe to switch to using 3 bytes per Java char (the unit of measurement returned by String.length()), like CompressingStoredFieldsWriter.writeField() does.

      Attachments

        1. SOLR-7971-doublepass.patch
          4 kB
          Shalin Shekhar Mangar
        2. SOLR-7971-doublepass.patch
          4 kB
          Noble Paul
        3. SOLR-7971-doublepass.patch
          5 kB
          Shalin Shekhar Mangar
        4. SOLR-7971-directbuffer.patch
          3 kB
          Shalin Shekhar Mangar
        5. SOLR-7971-directbuffer.patch
          4 kB
          Shalin Shekhar Mangar
        6. SOLR-7971-directbuffer.patch
          4 kB
          Shalin Shekhar Mangar
        7. SOLR-7971.patch
          1 kB
          Shalin Shekhar Mangar

        Activity

          People

            shalin Shalin Shekhar Mangar
            shalin Shalin Shekhar Mangar
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: