Avro
  1. Avro
  2. AVRO-625

RPC: permit out-of-order responses

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: java, spec
    • Labels:
      None

      Description

      It should be possible, when using a stateful, connection-based transport, for a client to complete a second request over a connection before the first request has returned. In other words, responses should be permitted to arrive out-of-order.

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I think it is possible to implement this as an optional, back-compatible feature as follows:

          • clients may add a "call-id" metadata field to requests, uniquely identifying the call within the connection.
          • if no call-id is present in a request, servers must respond in-order.
          • if a call-id is present in a request, and a server supports out-of-order responses, the server must copy the "call-id" to the response;
          • if no call-id is present in a response, then the response must be in-order.

          Simple clients and servers need not specify call ids.

          A client which can handle out-of-order responses might keep a lifo queue of outstanding call-ids so that it can process responses that lack a call-id.

          A server which implements out-of-order responses, when requests lack call-ids, might block subsequent requests over the connection until each response has been sent.

          Show
          Doug Cutting added a comment - I think it is possible to implement this as an optional, back-compatible feature as follows: clients may add a "call-id" metadata field to requests, uniquely identifying the call within the connection. if no call-id is present in a request, servers must respond in-order. if a call-id is present in a request, and a server supports out-of-order responses, the server must copy the "call-id" to the response; if no call-id is present in a response, then the response must be in-order. Simple clients and servers need not specify call ids. A client which can handle out-of-order responses might keep a lifo queue of outstanding call-ids so that it can process responses that lack a call-id. A server which implements out-of-order responses, when requests lack call-ids, might block subsequent requests over the connection until each response has been sent.
          Hide
          ryan rawson added a comment -

          Note that in HBase land we have suffered heavily from the 1 socket per RPC endpoint pair policy. In highly multithreaded apps our users find that they can't get the performance they desire due to this. A few people have independently invented mechanisms to make this thread local that gives massive speedups in their clients.

          Multiplexing multiple RPCs on the same socket is in other words not a clear gain when the RPC request and/or response may be significantly large.

          Show
          ryan rawson added a comment - Note that in HBase land we have suffered heavily from the 1 socket per RPC endpoint pair policy. In highly multithreaded apps our users find that they can't get the performance they desire due to this. A few people have independently invented mechanisms to make this thread local that gives massive speedups in their clients. Multiplexing multiple RPCs on the same socket is in other words not a clear gain when the RPC request and/or response may be significantly large.
          Hide
          Doug Cutting added a comment -

          Ryan, it sounds like we should make it possible to avoid connection sharing too. A simple approach might be to continue to have a connection per client object, as Avro does today. (Hadoop has a static cache of connections that are shared among client objects.) Then, if a single proxy is used from many threads, it would share the connection, but if each thread uses its own client object it would have its own connection. Would that suffice?

          Note that, with the current Avro approach, once a connection is closed then its client instance may no longer be used: connections are not automatically re-opened as they are in Hadoop. I don't know yet if this is a bug or a feature.

          Show
          Doug Cutting added a comment - Ryan, it sounds like we should make it possible to avoid connection sharing too. A simple approach might be to continue to have a connection per client object, as Avro does today. (Hadoop has a static cache of connections that are shared among client objects.) Then, if a single proxy is used from many threads, it would share the connection, but if each thread uses its own client object it would have its own connection. Would that suffice? Note that, with the current Avro approach, once a connection is closed then its client instance may no longer be used: connections are not automatically re-opened as they are in Hadoop. I don't know yet if this is a bug or a feature.
          Hide
          Doug Cutting added a comment -

          Ryan, I'm curious, what is the performance bottleneck with HBase when connections are shared? In theory, with speed-of-light and switch delays, multiple TCP streams are faster than a single stream, and perhaps that's what's seen here, but I didn't think such effects came into play within a data center. Also, if responses are streamed off of disk directly to the socket, then, in theory, serializing responses over a single connection would limit throughput to the speed of a single disk. But RPC responses are typically buffered before they're written, so I wouldn't expect that to be a factor. Do you know what the actual bottleneck is?

          Show
          Doug Cutting added a comment - Ryan, I'm curious, what is the performance bottleneck with HBase when connections are shared? In theory, with speed-of-light and switch delays, multiple TCP streams are faster than a single stream, and perhaps that's what's seen here, but I didn't think such effects came into play within a data center. Also, if responses are streamed off of disk directly to the socket, then, in theory, serializing responses over a single connection would limit throughput to the speed of a single disk. But RPC responses are typically buffered before they're written, so I wouldn't expect that to be a factor. Do you know what the actual bottleneck is?
          Hide
          ryan rawson added a comment -

          Not sure exactly, but when multiple threads (lets say 50) all share 1 socket to talk to 1 regionserver (in my tests), the speed is capped below 13k ops/sec. Freeing that, along with some server-side fixes, we can hit past 21k. The thing to remember is that HBase doesn't stream data from disk ever - we read & cache data from HDFS, and in my test, the test is 100% memory cached. Also the more data you have to stream, the worse things become, mostly because it takes a while to read out 64kbytes into a local array.

          The memory caching is key here, since everything comes from RAM we should be able to serve at insane speeds, but this is just not the case, in part due to this connection sharing, also in part due to server-side issues (Listener actually reading & deserializing data for all connections) and so on.

          Show
          ryan rawson added a comment - Not sure exactly, but when multiple threads (lets say 50) all share 1 socket to talk to 1 regionserver (in my tests), the speed is capped below 13k ops/sec. Freeing that, along with some server-side fixes, we can hit past 21k. The thing to remember is that HBase doesn't stream data from disk ever - we read & cache data from HDFS, and in my test, the test is 100% memory cached. Also the more data you have to stream, the worse things become, mostly because it takes a while to read out 64kbytes into a local array. The memory caching is key here, since everything comes from RAM we should be able to serve at insane speeds, but this is just not the case, in part due to this connection sharing, also in part due to server-side issues (Listener actually reading & deserializing data for all connections) and so on.
          Hide
          Doug Cutting added a comment -

          My hope is that we can standardize on a single Avro RPC wire format that both:

          • permits out-of-order responses; and
          • permits encryption, authentication and authorization.

          Currently NettyTransceiver provides the former and SaslSocketTransceiver provides the latter.

          My original proposal above was to add out-of-order to SaslSocketTransceiver in a back-compatible manner. But it seems that lots of folks are already using NettyTransceiver in production. So I wonder if we instead might add SASL negotiation to NettyTransceiver and otherwise standardize on its wire format. The overhead of adding this is very small when security features are not used, just adding a few bytes to the first request and response.

          http://avro.apache.org/docs/current/sasl.html#anonymous

          To implement this I'd:

          • add a new SaslTransceiver that updates SaslSocketTransceiver to be compatible with NettyTransceiver,
          • add a new SaslNettyTransceiver that adds the anonymous SASL handshake to NettyTransceiver.
          • deprecate the other transceivers.

          This simple change would mean that folks would still have to choose between out-of-order responses and security, but the two would use a compatible format, so that eventually one or the other could be extended to support both without breaking existing applications. The specification would then contain a single wire format that supports both, a single alternative to Http.

          Does this sound like a reasonable approach?

          Show
          Doug Cutting added a comment - My hope is that we can standardize on a single Avro RPC wire format that both: permits out-of-order responses; and permits encryption, authentication and authorization. Currently NettyTransceiver provides the former and SaslSocketTransceiver provides the latter. My original proposal above was to add out-of-order to SaslSocketTransceiver in a back-compatible manner. But it seems that lots of folks are already using NettyTransceiver in production. So I wonder if we instead might add SASL negotiation to NettyTransceiver and otherwise standardize on its wire format. The overhead of adding this is very small when security features are not used, just adding a few bytes to the first request and response. http://avro.apache.org/docs/current/sasl.html#anonymous To implement this I'd: add a new SaslTransceiver that updates SaslSocketTransceiver to be compatible with NettyTransceiver, add a new SaslNettyTransceiver that adds the anonymous SASL handshake to NettyTransceiver. deprecate the other transceivers. This simple change would mean that folks would still have to choose between out-of-order responses and security, but the two would use a compatible format, so that eventually one or the other could be extended to support both without breaking existing applications. The specification would then contain a single wire format that supports both, a single alternative to Http. Does this sound like a reasonable approach?
          Hide
          Philip Zeyliger added a comment -

          It'd be great to get a non-Java implementation as well. Today, I think there are only Http-tranceiver clients in python.

          Show
          Philip Zeyliger added a comment - It'd be great to get a non-Java implementation as well. Today, I think there are only Http-tranceiver clients in python.
          Hide
          Doug Cutting added a comment -

          I think the lack of a single non-HTTP standard has discouraged non-Java implementations. So settling on a standard wire format that can support both security and out-of-order responses is I hope a first step towards getting more non-HTTP, non-Java implementations.

          Show
          Doug Cutting added a comment - I think the lack of a single non-HTTP standard has discouraged non-Java implementations. So settling on a standard wire format that can support both security and out-of-order responses is I hope a first step towards getting more non-HTTP, non-Java implementations.
          Hide
          Mark Farnan added a comment -

          This looks quite interesting.

          I am currently working on CSharp implementations for TCP-Socket, and a new one, Web-Socket (as per RFC 6455)
          Will see if can work this in with that.

          Show
          Mark Farnan added a comment - This looks quite interesting. I am currently working on CSharp implementations for TCP-Socket, and a new one, Web-Socket (as per RFC 6455) Will see if can work this in with that.
          Hide
          Doug Cutting added a comment -

          Perhaps we should use SPDY for Avro?

          SPDY provides secure sessions that multiplex many streams over a single connection. A handshake need only be performed once per session. Each request can use a new stream, so responses can arrive out-of-order. Both synchronous and asynchronous APIs could easily be supported.

          For Java, Jetty provides a client and server implementation. There SPDY libraries for C, Ruby & Python. For C# the best I can find is the one referenced from:
          http://mail-archives.apache.org/mod_mbox/tomcat-dev/201205.mbox/%3C002601cd3a90$af3e74e0$0dbb5ea0$@preisser@t-online.de%3E

          Show
          Doug Cutting added a comment - Perhaps we should use SPDY for Avro? SPDY provides secure sessions that multiplex many streams over a single connection. A handshake need only be performed once per session. Each request can use a new stream, so responses can arrive out-of-order. Both synchronous and asynchronous APIs could easily be supported. For Java, Jetty provides a client and server implementation. There SPDY libraries for C, Ruby & Python. For C# the best I can find is the one referenced from: http://mail-archives.apache.org/mod_mbox/tomcat-dev/201205.mbox/%3C002601cd3a90$af3e74e0$0dbb5ea0$@preisser@t-online.de%3E

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:

                Development