• Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: Response Writers, SolrCloud
    • Labels:


      As discussed in SOLR-7927, we can reduce the buffer memory allocated by JavaBinCodec while writing large strings.

      The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF ( This is encoded in UTF-16 as surrogate pair \uDBFF\uDFFF, which takes up two Java chars, and is represented in UTF-8 as the 4-byte sequence F4 8F BF BF. This is likely where the mistaken 4-bytes-per-Java-char formulation came from: the maximum number of UTF-8 bytes required to represent a Unicode code point is 4.

      The maximum Java char is \uFFFF, which is represented in UTF-8 as the 3-byte sequence EF BF BF.

      So I think it's safe to switch to using 3 bytes per Java char (the unit of measurement returned by String.length()), like CompressingStoredFieldsWriter.writeField() does.


        1. SOLR-7971.patch
          1 kB
          Shalin Shekhar Mangar
        2. SOLR-7971-directbuffer.patch
          4 kB
          Shalin Shekhar Mangar
        3. SOLR-7971-directbuffer.patch
          4 kB
          Shalin Shekhar Mangar
        4. SOLR-7971-directbuffer.patch
          3 kB
          Shalin Shekhar Mangar
        5. SOLR-7971-doublepass.patch
          5 kB
          Shalin Shekhar Mangar
        6. SOLR-7971-doublepass.patch
          4 kB
          Noble Paul
        7. SOLR-7971-doublepass.patch
          4 kB
          Shalin Shekhar Mangar



            • Assignee:
              shalinmangar Shalin Shekhar Mangar
              shalinmangar Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              5 Start watching this issue


              • Created: