Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4227

StreamingUpdateSolrServer does not buffer OutputStreamWriter with BufferedWriter, causing encoding explosion

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2
    • Fix Version/s: 4.7, 6.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Java 1.6, Linux. I am running SOLR 3.2, but the code doesn't seem different in 3.5.

      Description

      org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer line 112 is:
      OutputStreamWriter writer = new OutputStreamWriter(out, "UTF-8");
      and then we call
      req.writeXML( writer );
      Because the writer is not buffered, this causes the XML writer to call the UTF-8 encoder for each atom being written, like in org.apache.solr.common.util.XML.writeXML:
      out.write('<');
      This causes the stream encoder to allocate a char array to hold it, and
      sun.nio.cs.StreamEncoder.implWrite allocates a CharBuffer to wrap it. All just for one character.

      This is particularly a problem when you have a lot of threads (100?) writing to the SOLR server, they rapidly eat up all the CPU.

      It would be helpful to allocate the writer as a BufferedWriter, so encoding only happens when you flush. JavaDoc for OutputStreamWriter recommends this: "For top efficiency, consider wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent converter invocations."

        Attachments

        1. SOLR-4227.patch
          0.8 kB
          Shalin Shekhar Mangar

          Activity

            People

            • Assignee:
              shalinmangar Shalin Shekhar Mangar
              Reporter:
              ckherrmann Conrad Herrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: