Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3183

Do Not Double Buffer Data in DataFileWriter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.10.0
    • 1.11.0
    • java
    • None

    Description

      DataFileWriter.java
        private void init(OutputStream outs) throws IOException {
          this.underlyingStream = outs;
          this.out = new BufferedFileOutputStream(outs);
          EncoderFactory efactory = new EncoderFactory();
          // binaryEncoder returns a buffered Encoder and is wrapping a BufferedFileOutputStream
          this.vout = efactory.binaryEncoder(out, null);
          dout.setSchema(schema);
          buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval * 1.25), Integer.MAX_VALUE / 2 - 1));
          // binaryEncoder returns a buffered Encoder and is wrapping a NonCopyingByteArrayOutputStream
          this.bufOut = efactory.binaryEncoder(buffer, null);
          if (this.codec == null) {
            this.codec = CodecFactory.nullCodec().createInstance();
          }
          this.isOpen = true;
        }
      

      The FileWriter is double-buffering the output which just adds redundant overhead and truthfully the buffering offered by the object returned by binaryEncoder is a bit simplistic and does not do as good of a job as the buffering in BufferedFileOutputStream.

      Remove this double buffering by using a 'direct' binaryEncoder

      Attachments

        Activity

          People

            belugabehr David Mollitor
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: