Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-184

Add thin multi-language support for SamzaContainer



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6.0
    • None
    • container


      There has been some interest in supporting languages other than Java (or JVM-based languages). We have already opened up SAMZA-18, which proposes supporting a C implementation of SamzaContainer.

      A second solution to this problem is to have a StreamTask implementation that starts a child process in another language, and acts as a bridge between the child process and the java-based Samza APIs. This is the way that both Storm [1] and Hadoop work.

      A lot of design decisions need to be fleshed out to support this, but most people on the mailing list were very supportive of this approach. [2]

      Things that need to be decided:

      1. Should we start one subprocess per SamzaContainer, or one subprocess per StreamTask?
      2. How should the parent interact with the subprocess at both the transport (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level (protobuf, json, etc)?
      3. What should the protocol look like? We should ideally support all of the operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
      4. Should the child process receive the messages in batches, or one at a time?

      It'd be good to get a draft proposal up on the Wiki, so we can all discuss this and converge on an implementation.

      [1] http://storm.incubator.apache.org/documentation/Multilang-protocol.html
      [2] http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E


        1. Test.java
          4 kB
          Martin Kleppmann

        Issue Links



              davidzchen David Chen
              criccomini Chris Riccomini
              1 Vote for this issue
              7 Start watching this issue