Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10003

Encoder interface inefficient when wanting to use more sophisticated outputstreams

    XMLWordPrintableJSON

Details

    Description

      The StreamingFileSink uses the Encoder interface to serialize data.

      public interface Encoder<IN> extends Serializable {
      	void encode(IN element, OutputStream stream) throws IOException;
      }
      

      The implementation (with the exception for strings) must be provided by the user.
      To use any OutputStream implementation that is a little more convenient than the base OutputStream (like DataOutputStream) requires creating a new stream for every single record. If an implementation is used that potentially buffers data users additionally have to call flush().

      Instead we could allow specifying an optional factory for the streams, that would be called once for each part file, and modify the Encoder interface to have a generic type for the output stream.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chesnay Chesnay Schepler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: