Thrift
  1. Thrift
  2. THRIFT-146

Make Thrift EventMachine Compatible

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Trivial Trivial
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: Ruby - Library
    • Labels:
      None
    • Environment:

      Ruby and EventMachine

    • Patch Info:
      Patch Available

      Description

      I wrote a prototype EventMachine client/server for Thrift (BinaryProtocol), and while it's not fully functional, the results are so promising, that I think it would be worth it to either include an EM client/server with the thrift distribution, or at least rework the libraries so that they can support EM without having to copy/paste all the code I did.

      I'm attaching my prototype as an example. It requires the eventmachine and statemachine gems. It's wired to support a simple echo service. It's a little confusing in that I reused the same statemachine for both client and server, doesn't handle exceptions, and has a bug where it'll handle all requests from one client before handling requests from another, but hey, it's a prototype.

      EM uses very different semantics from typical client/server, and this presents difficulty wiring it to the current thrift protocols. For example, while currently, you can call call prot.read to read n bytes off of a stream, or wait until they're available, with EM, if n bytes aren't available, you simply don't handle the call yet. I handled this by trying to read as many bytes as need be, and keeping track of how much to backup in the stream, ultimately only transitioning to the next state in the statemachine if we were able to read all the bytes needed. This actually works fairly well, but it would be better if we had a byte count at each step so we could tell right away if we had enough data.

      The use of EM allows for new asynchronous semantics for thrift calls, in that the client returns right away from a call, and its callback is called when a result (or exception) is available. Backwards compatibility could still be achieved by blocking on a call, but at least in my experience, in most of the cases having callbacks would be a huge optimization. There are some cases where I do want to either ensure that calls happen sequentially, or are at least handled in the same order.

      These are the changes I believe would have to be made in order to support an EM thrift connection:
      1. syntax to indicate that certain methods can be called asynchronously with callbacks. Since async is already a reserved keyword, and is changing to noreturn, what about using this?
      2. I would propose moving all reading and writing to the same class. So instead of having ThriftStruct#read, we would have Protocol#read_struct. It seems out of place to me to have a model handle its own reading / writing.
      3. support for byte counts in the protocols. It would be helpful to have a clear way to get the number of bytes that must be read to retrieve message headers, arguments, and results. Alternatively, the protocols could support read_or_backup semantics, but as you can see in my prototype, this can get messy.

      Please let me know what you think about this proposal, and the possibility of using EventMachine.

        Activity

        Hide
        Ben Taitelbaum added a comment -

        prototype echo client/server using eventmachine and binaryprotocol

        Show
        Ben Taitelbaum added a comment - prototype echo client/server using eventmachine and binaryprotocol
        Hide
        Bryan Duxbury added a comment -

        It looks like there was never too much interest in this issue, and Ben hasn't done any further work on it, to my knowledge. To boot, it calls for some pretty complex changes to a variety of larger Thrift constructs.

        Should we close this issue "wontfix" for the time being? I don't see much benefit to leaving this just hanging here.

        Show
        Bryan Duxbury added a comment - It looks like there was never too much interest in this issue, and Ben hasn't done any further work on it, to my knowledge. To boot, it calls for some pretty complex changes to a variety of larger Thrift constructs. Should we close this issue "wontfix" for the time being? I don't see much benefit to leaving this just hanging here.
        Hide
        Ben Taitelbaum added a comment -

        Yeah, I determined that the way I was doing it, while it could have worked very well, was just too much effort, and not inline with the thrift architecture. The way the python implementation works with Twisted should be a better approach.

        It would be extremely useful to be able to make multiple asynchronous calls in parallel through the same client, and use a callback mechanism without having to use threads, but do you think this should be part of Thrift, or something that sits on top of Thrift?

        Show
        Ben Taitelbaum added a comment - Yeah, I determined that the way I was doing it, while it could have worked very well, was just too much effort, and not inline with the thrift architecture. The way the python implementation works with Twisted should be a better approach. It would be extremely useful to be able to make multiple asynchronous calls in parallel through the same client, and use a callback mechanism without having to use threads, but do you think this should be part of Thrift, or something that sits on top of Thrift?
        Hide
        Bryan Duxbury added a comment -

        It could still be a part of Thrift, but I would imagine it'd just be a different kind of Client implementation on top of the existing one. If that's what you're looking for, then maybe we should spec that out in a separate issue, as this one is tailored to EventMachine.

        Show
        Bryan Duxbury added a comment - It could still be a part of Thrift, but I would imagine it'd just be a different kind of Client implementation on top of the existing one. If that's what you're looking for, then maybe we should spec that out in a separate issue, as this one is tailored to EventMachine.
        Hide
        Esteve Fernandez added a comment -

        It shouldn't be too hard, we (Fluidinfo) had it planned and plug it into AMQP, I could do it myself in a fairly reasonable time (or give a helping hand), but would like to know if there's any interest and if it would have any chances of being incorporated into Thrift in the end.

        Show
        Esteve Fernandez added a comment - It shouldn't be too hard, we (Fluidinfo) had it planned and plug it into AMQP, I could do it myself in a fairly reasonable time (or give a helping hand), but would like to know if there's any interest and if it would have any chances of being incorporated into Thrift in the end.
        Hide
        Kevin Clark added a comment -

        Sorry to take so long to weigh in on this. I'm open to the contribution as long as it doesn't break other code. If it breaks other code, we're going to need to justify it.

        Show
        Kevin Clark added a comment - Sorry to take so long to weigh in on this. I'm open to the contribution as long as it doesn't break other code. If it breaks other code, we're going to need to justify it.
        Hide
        Ryan King added a comment -

        There's also an implemention of this in our ruby thrift_client library:

        http://github.com/fauna/thrift_client/blob/master/lib/thrift_client/event_machine.rb

        Might be worth looking at.

        Show
        Ryan King added a comment - There's also an implemention of this in our ruby thrift_client library: http://github.com/fauna/thrift_client/blob/master/lib/thrift_client/event_machine.rb Might be worth looking at.
        Hide
        Bryan Duxbury added a comment -

        Ryan - if someone on your side wanted to make a push to contribute your client library, I'd help you to do so.

        Show
        Bryan Duxbury added a comment - Ryan - if someone on your side wanted to make a push to contribute your client library, I'd help you to do so.
        Hide
        Ryan King added a comment -

        I'll ask around.

        Show
        Ryan King added a comment - I'll ask around.
        Hide
        Mike Perham added a comment -

        I wrote the EventMachine support for thrift_client. It uses Fibers available in Ruby 1.9 to simulate a synchronous API. Unfortunately I'm not using Cassandra or thrift_client these days so I don't have time to work on this but I'm happy to answer questions if someone else wants to take over.

        Show
        Mike Perham added a comment - I wrote the EventMachine support for thrift_client. It uses Fibers available in Ruby 1.9 to simulate a synchronous API. Unfortunately I'm not using Cassandra or thrift_client these days so I don't have time to work on this but I'm happy to answer questions if someone else wants to take over.
        Hide
        Brian Takita added a comment -

        I'm interested in using the EventMachine thrift server. Is the implementation in line with thrift-core? Has anybody used it in their environment?

        Show
        Brian Takita added a comment - I'm interested in using the EventMachine thrift server. Is the implementation in line with thrift-core? Has anybody used it in their environment?
        Hide
        Ryan King added a comment -

        I know people have used it (I haven't personally) and is probably worthy of inclusion in the core.

        Show
        Ryan King added a comment - I know people have used it (I haven't personally) and is probably worthy of inclusion in the core.
        Hide
        Peter Sanford added a comment -

        I've attached a patch series implementing an EventMachine client for thrift. The work is based on the Twisted and NodeJS implementations.

        There is a new compiler namespace rb:eventmachine that will generate the em specific bindings.

        All of the thrift calls will return an EventMachine::Deferrable object. Once the service has responded, the deferrable's .callback function will be executed with the resulting value if there is one. Here's a sample:

        require 'eventmachine'
        require 'gen-rb.eventmachine/example_client'
        require 'thrift/transport/event_machine_transport'
        
        EM.run do
          connection = Thrift::EventMachineTransport.connect(ExampleClient, host, port)
          connection.errback { puts "Could not connect to server"; EM.stop_event_loop; }
          connection.callback do |client|
            client.rpc_method_a(value).callback {|result| puts "got result #{result}"; }
          end
        end
        

        There is a real example in tutorial/rb/RubyClientEventMachine.rb. There are also roundtrip tests in event_machine_client_spec.rb that provide in depth examples.

        Note that 'EventMachine' is a development_dependency in the gemspec instead of a runtime dependency. Since most existing projects do not use EventMachine it probably doesn't make sense to force them to install the EM gem. If you want to use this you will need to require eventmachine yourself. You will also need to require 'thrift/transport/event_machine_transport' as it references EventMachine classes so it is not required by default in thrift.rb.

        Show
        Peter Sanford added a comment - I've attached a patch series implementing an EventMachine client for thrift. The work is based on the Twisted and NodeJS implementations. There is a new compiler namespace rb:eventmachine that will generate the em specific bindings. All of the thrift calls will return an EventMachine::Deferrable object. Once the service has responded, the deferrable's .callback function will be executed with the resulting value if there is one. Here's a sample: require 'eventmachine' require 'gen-rb.eventmachine/example_client' require 'thrift/transport/event_machine_transport' EM.run do connection = Thrift::EventMachineTransport.connect(ExampleClient, host, port) connection.errback { puts "Could not connect to server" ; EM.stop_event_loop; } connection.callback do |client| client.rpc_method_a(value).callback {|result| puts "got result #{result}" ; } end end There is a real example in tutorial/rb/RubyClientEventMachine.rb. There are also roundtrip tests in event_machine_client_spec.rb that provide in depth examples. Note that 'EventMachine' is a development_dependency in the gemspec instead of a runtime dependency. Since most existing projects do not use EventMachine it probably doesn't make sense to force them to install the EM gem. If you want to use this you will need to require eventmachine yourself. You will also need to require 'thrift/transport/event_machine_transport' as it references EventMachine classes so it is not required by default in thrift.rb.
        Hide
        Peter Sanford added a comment -

        I'd like to get the above patch stream committed. What needs to be done to make that happen?

        Show
        Peter Sanford added a comment - I'd like to get the above patch stream committed. What needs to be done to make that happen?
        Hide
        Jake Farrell added a comment -

        Peter, any more test cases and documentation you would like to provide? I'll start reviewing this

        Show
        Jake Farrell added a comment - Peter, any more test cases and documentation you would like to provide? I'll start reviewing this
        Hide
        Peter Sanford added a comment -

        If there are any test cases that you think I've missed I would be happy to add them.

        For documentation, do you mean end user documentation or code level documentation? I'll add a brief overview of how the different components interact later tonight.

        Show
        Peter Sanford added a comment - If there are any test cases that you think I've missed I would be happy to add them. For documentation, do you mean end user documentation or code level documentation? I'll add a brief overview of how the different components interact later tonight.
        Hide
        Peter Sanford added a comment -

        This is an overview of the code changes:

        • Compiler changes to generate async compatible bindings (t_rb_generator.cc)
        • EventMachineTransport that handles the interaction between EM and Thrift bindings

        EventMachineTransport is the glue between EM and Thrift. EventMachine needs to control connection management (creating and reading from a socket) so this is used instead of Thrift::Socket. Data that is sent to the client gets passed to EventMachineTransport#receive_data which splits it into frames and then dispatches to the #recv_* method on the generated bindings.

        The bindings that the compiler generates have a number of differences between default ruby and eventmachine modes. I'll use the code generated from the tutorial as an example. The default binds look like this:

        def ping()
          send_ping()
          recv_ping()
        end
        
        def send_ping()
          send_message('ping', Ping_args)
        end
        
        def recv_ping()
          result = receive_message(Ping_result)
          return
        end
        

        This sends the ping message, waits for the response and then returns it. With EM the code sends the ping message, and then returns a EventMachine::Deferrable object immediately:

        def ping()
          @seqid += 1
          d = @callbacks[@seqid] = deferrable
          send_ping()
          return d
        end
        
        def send_ping()
          send_message('ping', Ping_args)
        end
        
        def recv_ping(iprot, mtype, rseqid)
          d = @callbacks.delete(rseqid)
          if mtype == Thrift::MessageTypes::EXCEPTION
            x = ApplicationException.new
            x.read(iprot)
            iprot.read_message_end
            d.fail(x)
            return
          end
          result = Ping_result.new
          result.read(iprot)
          iprot.read_message_end
          d.succeed
          return
        end
        

        When there is a response, EventMachineTransport will dispatch to #recv_ping which will then execute user supplied callbacks on the deferrable object.

        A few other things of note:

        • EventMachineTransport uses EventMachineFramedReader instead of FramedTransport for reading frames. EventMachineFramedReader knows how to parse frames without needing access to a raw socket for reading (EventMachine does the read for you and then passes the data to the #receive_data callback).
        • When the client binding code needs to generate a new deferrable object it calls EventMachineTransport#deferrable. This is necessary so that we can signal an error (with an error callback) when a client tries to create a new request with a closed connection. It also makes it easy to setup a default for requests.
        • event_machine_transport.rb patches pre 1.0 versions of EM to allow EventMachine::Deferrable#timeout to accept arguments that will be passed to the callback. This matches >= 1.0 behavior and should not cause any issues for users still on a version less than 1.0.
        Show
        Peter Sanford added a comment - This is an overview of the code changes: Compiler changes to generate async compatible bindings (t_rb_generator.cc) EventMachineTransport that handles the interaction between EM and Thrift bindings EventMachineTransport is the glue between EM and Thrift. EventMachine needs to control connection management (creating and reading from a socket) so this is used instead of Thrift::Socket. Data that is sent to the client gets passed to EventMachineTransport#receive_data which splits it into frames and then dispatches to the #recv_* method on the generated bindings. The bindings that the compiler generates have a number of differences between default ruby and eventmachine modes. I'll use the code generated from the tutorial as an example. The default binds look like this: def ping() send_ping() recv_ping() end def send_ping() send_message('ping', Ping_args) end def recv_ping() result = receive_message(Ping_result) return end This sends the ping message, waits for the response and then returns it. With EM the code sends the ping message, and then returns a EventMachine::Deferrable object immediately: def ping() @seqid += 1 d = @callbacks[@seqid] = deferrable send_ping() return d end def send_ping() send_message('ping', Ping_args) end def recv_ping(iprot, mtype, rseqid) d = @callbacks.delete(rseqid) if mtype == Thrift::MessageTypes::EXCEPTION x = ApplicationException. new x.read(iprot) iprot.read_message_end d.fail(x) return end result = Ping_result. new result.read(iprot) iprot.read_message_end d.succeed return end When there is a response, EventMachineTransport will dispatch to #recv_ping which will then execute user supplied callbacks on the deferrable object. A few other things of note: EventMachineTransport uses EventMachineFramedReader instead of FramedTransport for reading frames. EventMachineFramedReader knows how to parse frames without needing access to a raw socket for reading (EventMachine does the read for you and then passes the data to the #receive_data callback). When the client binding code needs to generate a new deferrable object it calls EventMachineTransport#deferrable. This is necessary so that we can signal an error (with an error callback) when a client tries to create a new request with a closed connection. It also makes it easy to setup a default for requests. event_machine_transport.rb patches pre 1.0 versions of EM to allow EventMachine::Deferrable#timeout to accept arguments that will be passed to the callback. This matches >= 1.0 behavior and should not cause any issues for users still on a version less than 1.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ben Taitelbaum
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development