Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: spec
    • Labels:

      Description

      We should develop a high-performance, secure, transport for Avro. This should be named with avro: uris.

      1. aep-001.txt
        3 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I plan to further develop Java's non-standard socket protocol as a prototype for a future, standard, high-performance, secure RPC transport for Avro. First I hope to implement out-of-order responses (AVRO-625) then to add SASL-based security (AVRO-641).

          Show
          Doug Cutting added a comment - I plan to further develop Java's non-standard socket protocol as a prototype for a future, standard, high-performance, secure RPC transport for Avro. First I hope to implement out-of-order responses ( AVRO-625 ) then to add SASL-based security ( AVRO-641 ).
          Hide
          Doug Cutting added a comment -

          We don't have a Confluence wiki yet. Someone needs to as infrastructure to create one and then move all of our content to it. Volunteers welcome.

          To some degree it doesn't matter where it's hosted. We just need a clearly defined process. The PEP process is version-control based, which more naturally keeps it from becoming a free-for-all. But a wiki-based process could be defined. We don't want folks to just add features they'd like to see to a spec without considering implementation costs. A proposed enhancement should have a clear owner, a clear process for integrating changes to the proposal and a clear process for acceptance.

          If maintaining the proposal in Jira proves awkward, then we can commit an early version of it to subversion and then folks can submit patches to it. Or someone can propose and instigate an alternate process.

          Show
          Doug Cutting added a comment - We don't have a Confluence wiki yet. Someone needs to as infrastructure to create one and then move all of our content to it. Volunteers welcome. To some degree it doesn't matter where it's hosted. We just need a clearly defined process. The PEP process is version-control based, which more naturally keeps it from becoming a free-for-all. But a wiki-based process could be defined. We don't want folks to just add features they'd like to see to a spec without considering implementation costs. A proposed enhancement should have a clear owner, a clear process for integrating changes to the proposal and a clear process for acceptance. If maintaining the proposal in Jira proves awkward, then we can commit an early version of it to subversion and then folks can submit patches to it. Or someone can propose and instigate an alternate process.
          Hide
          Bruce Mitchener added a comment -

          I like that and would be happy to help flesh it out ... would Confluence be a good place for AEPs?

          Show
          Bruce Mitchener added a comment - I like that and would be happy to help flesh it out ... would Confluence be a good place for AEPs?
          Hide
          Doug Cutting added a comment -

          Perhaps we should adopt a PEP-like process for things like this.

          http://www.python.org/dev/peps/pep-0001/

          In that vein, I've attached the beginnings of an Avro Enhancement Proposal (AEP) for this feature.

          Bruce, would you like to help flesh this out? I think we should first come to general agreement about a specification, then implement it in at least two languages, perhaps changing the design as we go, before we declare it as a stable Avro feature.

          Does that sound like a reasonable process?

          Show
          Doug Cutting added a comment - Perhaps we should adopt a PEP-like process for things like this. http://www.python.org/dev/peps/pep-0001/ In that vein, I've attached the beginnings of an Avro Enhancement Proposal (AEP) for this feature. Bruce, would you like to help flesh this out? I think we should first come to general agreement about a specification, then implement it in at least two languages, perhaps changing the design as we go, before we declare it as a stable Avro feature. Does that sound like a reasonable process?
          Hide
          Jeff Hammerbacher added a comment -

          I think we've got enough information in this thread that we should probably all hash it out in front of a whiteboard.

          Did the above whiteboard-fronted hashing take place?

          Show
          Jeff Hammerbacher added a comment - I think we've got enough information in this thread that we should probably all hash it out in front of a whiteboard. Did the above whiteboard-fronted hashing take place?
          Hide
          Philip Zeyliger added a comment -

          > I'd like us to consider describing Avro's protocol in terms of (and here the terminology falls down) an Avro protocol, or at least in terms of Avro records.

          I think this could be deceptive if we don't support Avro versioning features here. I think it's often simpler to bootstrap a system not using itself but using a more primitive system. We'd like the transport to be extensible. Many protocols use a combination of named commands, key/value meta-data and a binary payload to provide extensibility. A description at this level in Avro is possible, but would probably not be able to describe the internal structure of every command's payload. So, in practice, folks might not in fact implement a so-specified transport using Avro, as that might require an extra copy of payload data. So we might stick to using Avro to describe just the command+metadata part, and leave payloads outside of this. Or we might alternately describe the command+metadata part as something as simple as <command>(<newline><name><colon><value><newline>)*<newline><payload>.

          I think this could be deceptive if we don't support Avro versioning features here

          I think we have to find a way to bootstrap RPC so that we do support Avro's versioning features on the Avro Rpc Protocol. We're going to want to add things like introspection. The good news is that the bootstrap can be dead simple: we could explicitly version the schemas that servers and clients understand. (With the restriction that versions still have to be forwards and backwards compatible in the Avro sense, but we can force the sender to project back into the older version.) Or we could do something more complicated and exchange the schemas like the current transport does.

          as that might require an extra copy of payload data.

          If we have to copy bytes, that indicates mostly a limited API, and not a fundamental restriction. Our users will implement things where they wish to avoid byte copying too, so this is as good a use case as any to see if we can pull it away. For this reason, I'm "meh" on the proposal of separating control and data planes, though it might be the simplest way around Avro bytes not being framed.

          I think we've got enough information in this thread that we should probably all hash it out in front of a whiteboard.

          Show
          Philip Zeyliger added a comment - > I'd like us to consider describing Avro's protocol in terms of (and here the terminology falls down) an Avro protocol, or at least in terms of Avro records. I think this could be deceptive if we don't support Avro versioning features here. I think it's often simpler to bootstrap a system not using itself but using a more primitive system. We'd like the transport to be extensible. Many protocols use a combination of named commands, key/value meta-data and a binary payload to provide extensibility. A description at this level in Avro is possible, but would probably not be able to describe the internal structure of every command's payload. So, in practice, folks might not in fact implement a so-specified transport using Avro, as that might require an extra copy of payload data. So we might stick to using Avro to describe just the command+metadata part, and leave payloads outside of this. Or we might alternately describe the command+metadata part as something as simple as <command>(<newline><name><colon><value><newline>)*<newline><payload>. I think this could be deceptive if we don't support Avro versioning features here I think we have to find a way to bootstrap RPC so that we do support Avro's versioning features on the Avro Rpc Protocol. We're going to want to add things like introspection. The good news is that the bootstrap can be dead simple: we could explicitly version the schemas that servers and clients understand. (With the restriction that versions still have to be forwards and backwards compatible in the Avro sense, but we can force the sender to project back into the older version.) Or we could do something more complicated and exchange the schemas like the current transport does. as that might require an extra copy of payload data. If we have to copy bytes, that indicates mostly a limited API, and not a fundamental restriction. Our users will implement things where they wish to avoid byte copying too, so this is as good a use case as any to see if we can pull it away. For this reason, I'm "meh" on the proposal of separating control and data planes, though it might be the simplest way around Avro bytes not being framed. I think we've got enough information in this thread that we should probably all hash it out in front of a whiteboard.
          Hide
          ryan rawson added a comment -

          here are some initial thoughts on requirements for applications like hbase... one thing that sets hbase apart from systems like datanodes is there are high volume, small payload calls. Some calls have as little as 10-20 bytes in payload and we wish to make them thousands of times a second. Obviously this is easy to fit into a 1500 MTU, but also we want to make sure that deserializing these RPCs dont end up being a prohibitive cost. An optimized protocol could eschew all these features assuming a limited schema set and less features, but it would be nice not to have to do so.

          Show
          ryan rawson added a comment - here are some initial thoughts on requirements for applications like hbase... one thing that sets hbase apart from systems like datanodes is there are high volume, small payload calls. Some calls have as little as 10-20 bytes in payload and we wish to make them thousands of times a second. Obviously this is easy to fit into a 1500 MTU, but also we want to make sure that deserializing these RPCs dont end up being a prohibitive cost. An optimized protocol could eschew all these features assuming a limited schema set and less features, but it would be nice not to have to do so.
          Hide
          Matt Massie added a comment -

          Todd> look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages

          Sounds good to me. The service number rules at the end of my message sort of speaks to that separation. We need to tease out the details to do it better.

          Doug> OSI terminology is not a priority for me. If it is for you, then please propose a patch updating the spec and all of the implementations.

          If the team agrees on different terminology, I will happily do the search/replace. Being consistent is important I agree, but it would be ideal to be consistent with the wider community outside of Avro.

          Doug> Except the Wikipedia page says that HTTP is at the application level

          http://en.wikipedia.org/wiki/OSI_model . RPC is listed as a Layer 5 (Session) example. HTTP is correctly at the Application level but this Jira is about RPC and not HTTP.

          Show
          Matt Massie added a comment - Todd> look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages Sounds good to me. The service number rules at the end of my message sort of speaks to that separation. We need to tease out the details to do it better. Doug> OSI terminology is not a priority for me. If it is for you, then please propose a patch updating the spec and all of the implementations. If the team agrees on different terminology, I will happily do the search/replace. Being consistent is important I agree, but it would be ideal to be consistent with the wider community outside of Avro. Doug> Except the Wikipedia page says that HTTP is at the application level http://en.wikipedia.org/wiki/OSI_model . RPC is listed as a Layer 5 (Session) example. HTTP is correctly at the Application level but this Jira is about RPC and not HTTP.
          Hide
          Doug Cutting added a comment -

          Oops. Matt's quote that I responded to above was meant to be: "Btw, RPC is at the session level with the presentation to the application being Avro serialization."

          Show
          Doug Cutting added a comment - Oops. Matt's quote that I responded to above was meant to be: "Btw, RPC is at the session level with the presentation to the application being Avro serialization."
          Hide
          Doug Cutting added a comment -

          Matt> look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages

          Except the Wikipedia page says that HTTP is at the application level, not at the session level. Seriously, we just need to pick some terms, define them, and use them consistently. Protocol and Transport both currently have clear, unambiguous definitions in Avro. Alignment with OSI terminology is not a priority for me. If it is for you, then please propose a patch updating the spec and all of the implementations.

          Todd> look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages

          +1 this makes good sense to me.

          Show
          Doug Cutting added a comment - Matt> look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages Except the Wikipedia page says that HTTP is at the application level, not at the session level. Seriously, we just need to pick some terms, define them, and use them consistently. Protocol and Transport both currently have clear, unambiguous definitions in Avro. Alignment with OSI terminology is not a priority for me. If it is for you, then please propose a patch updating the spec and all of the implementations. Todd> look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages +1 this makes good sense to me.
          Hide
          Todd Lipcon added a comment -

          It terms of vocabulary, I feel that discovery is more about finding all machines running Avro services (like bonjour or Zeroconf). The term introspection seems more appropriate here.

          +1. Thrift calls this idea "reflection" though they failed to implement it in a useful manner so I think it actually got pulled out.

          create a base RPC proxy for clients that passes the response bytes "up" to a higher level response processor

          This will be tricky to integrate with streaming RPC, and I think will make coding somewhat difficult, since in effect we'd have to embed one avro decoder inside another. I think it'll simplify things to look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages. The user data planed messages are of course in themselves avro-encoded, but using the user's protocol and types, rather than one we've defined. The control plane messages provide a sort of framing that connects the user frames to particular calls. Does that make sense?

          Show
          Todd Lipcon added a comment - It terms of vocabulary, I feel that discovery is more about finding all machines running Avro services (like bonjour or Zeroconf). The term introspection seems more appropriate here. +1. Thrift calls this idea "reflection" though they failed to implement it in a useful manner so I think it actually got pulled out. create a base RPC proxy for clients that passes the response bytes "up" to a higher level response processor This will be tricky to integrate with streaming RPC, and I think will make coding somewhat difficult, since in effect we'd have to embed one avro decoder inside another. I think it'll simplify things to look at the avro session as a state machine that flip flops between "control plane" messages and "user data plane" messages. The user data planed messages are of course in themselves avro-encoded, but using the user's protocol and types, rather than one we've defined. The control plane messages provide a sort of framing that connects the user frames to particular calls. Does that make sense?
          Hide
          Matt Massie added a comment -

          I support the idea of having an Avro RPC specification that is written as much as possible (completely?) in Avro schema. This isn't just good design, it also prevents duplicating work. I agree with Phil that we don't want...

          Instead of saying "and then there shall be a long, encoded like so, and then it shall by follows by that many bytes",...

          There are many good examples of RPC/serialization programs that describe RPC using IDLs. For example, the Internet Communications Engine (http://www.zeroc.com/doc/Ice-3.3.1/manual/Protocol.39.3.html) describes their RPC protocol using ICE (their IDL). SunRPC uses XDR to completely describe RPC (http://www.faqs.org/rfcs/rfc1050.html). There's even an RFC protocol script that pulls all the XDR definitions from an RFC and writes them into a single protocol (.x) file to be run using rpcgen. 1970s tech FTW!

          Here is a straw man to make it a little clearer what I'm proposing here.

          {"type": "record",
           "name": "rpc_message",
           "fields": [
              {"name": "xid", "type": "long"},
              {"name": "auth", "type": "bytes"},
              {"name": "body", [
          
                 {"type": "record",
                  "name": "rpc_call_message",
                  "fields": [
                    {"name": "rpcvers", "type": "long"},
                    {"name": "service", "type": "long"},
                    {"name": "version", "type": "long"},
                    {"name": "method", "type": "long"},
          
                 {"type": "record",
                  "name": "rpc_response_success",
                  "fields": [
                    {"name": "results", "type": "bytes"}]},
          
                 {"type": "record",
                  "name": "rpc_version_mismatch",
                  "field": [
                    {"name": "low_version", "type": "long"},
                    {"name": "high_version", "type": "long"}]},
          
                 {"type": "record",
                  "name": "rpc_service_unavailable",
                  "field": [
                    {"name": "reason", "type": "bytes"}]},
          
                 {"type": "record",
                  "name": "rpc_call_version_mismatch",
                  "fields": [
                    {"name": "low_version", "type": "long"},
                    {"name": "high_version", "type": "long"}]},
          
                 {"type": "record",
                  "name": "rpc_auth_error",
                  "field": [
                    {"name": "reason", "type": "bytes"}]} ]}
          

          This example is really just RFC 1050 wrapped up in Avro schema. This schema isn't complete but it's explicit. For example, it says that an rpc_response_success message is nothing but a bunch of bytes. That's okay. We can drill into the details of those opaque bytes in a separate response schema definition. This layering will give us flexibility in the future and make it easier to break RPC into components. For example, in this case, we could easily create a base RPC proxy for clients that passes the response bytes "up" to a higher level response processor. The proxy only needs to know the base RPC schema and nothing more.

          This base is also very light. You could CALL a remote method using as little as 6 bytes sent over whatever transport you like e.g. UDP, TCP, SSL, TCP-over-DNS. Transports only deal in bytes and could not care less about messages (although we may need to define record marking like section 10 of RFC 1050).

          Btw, RPC is at the session level with the presentation to the application being Avro serialization. I agree that using the term Avro transport is confusing and makes me cry elephant tears.

          OSI Layer Example
          Application Avro Proxy Object
          Presentation Avro Binary Serialization
          Session Avro RPC state machine
          Transport TCP, UDP, etc
          Network IP
          Data-link/Physical Ethernet

          DISCOVER: Asks the server for information about itself.

          It terms of vocabulary, I feel that discovery is more about finding all machines running Avro services (like bonjour or Zeroconf). The term introspection seems more appropriate here.

          Aside from introspection, we also need a simple Avro "ping" service. Using the base schema above, we could have a convention that says

          • All service numbers less than zero are reserved for Avro use (for discovery/introspection, ping, etc)
          • All service numbers greater than zero are for user-defined services.
          • Service number zero is the ping service.
          Show
          Matt Massie added a comment - I support the idea of having an Avro RPC specification that is written as much as possible (completely?) in Avro schema. This isn't just good design, it also prevents duplicating work. I agree with Phil that we don't want... Instead of saying "and then there shall be a long, encoded like so, and then it shall by follows by that many bytes",... There are many good examples of RPC/serialization programs that describe RPC using IDLs. For example, the Internet Communications Engine ( http://www.zeroc.com/doc/Ice-3.3.1/manual/Protocol.39.3.html ) describes their RPC protocol using ICE (their IDL). SunRPC uses XDR to completely describe RPC ( http://www.faqs.org/rfcs/rfc1050.html ). There's even an RFC protocol script that pulls all the XDR definitions from an RFC and writes them into a single protocol (.x) file to be run using rpcgen . 1970s tech FTW! Here is a straw man to make it a little clearer what I'm proposing here. { "type" : "record" , "name" : "rpc_message" , "fields" : [ { "name" : "xid" , "type" : " long " }, { "name" : "auth" , "type" : "bytes" }, { "name" : "body" , [ { "type" : "record" , "name" : "rpc_call_message" , "fields" : [ { "name" : "rpcvers" , "type" : " long " }, { "name" : "service" , "type" : " long " }, { "name" : "version" , "type" : " long " }, { "name" : "method" , "type" : " long " }, { "type" : "record" , "name" : "rpc_response_success" , "fields" : [ { "name" : "results" , "type" : "bytes" }]}, { "type" : "record" , "name" : "rpc_version_mismatch" , "field" : [ { "name" : "low_version" , "type" : " long " }, { "name" : "high_version" , "type" : " long " }]}, { "type" : "record" , "name" : "rpc_service_unavailable" , "field" : [ { "name" : "reason" , "type" : "bytes" }]}, { "type" : "record" , "name" : "rpc_call_version_mismatch" , "fields" : [ { "name" : "low_version" , "type" : " long " }, { "name" : "high_version" , "type" : " long " }]}, { "type" : "record" , "name" : "rpc_auth_error" , "field" : [ { "name" : "reason" , "type" : "bytes" }]} ]} This example is really just RFC 1050 wrapped up in Avro schema. This schema isn't complete but it's explicit . For example, it says that an rpc_response_success message is nothing but a bunch of bytes. That's okay. We can drill into the details of those opaque bytes in a separate response schema definition. This layering will give us flexibility in the future and make it easier to break RPC into components. For example, in this case, we could easily create a base RPC proxy for clients that passes the response bytes "up" to a higher level response processor. The proxy only needs to know the base RPC schema and nothing more. This base is also very light. You could CALL a remote method using as little as 6 bytes sent over whatever transport you like e.g. UDP, TCP, SSL, TCP-over-DNS. Transports only deal in bytes and could not care less about messages (although we may need to define record marking like section 10 of RFC 1050). Btw, RPC is at the session level with the presentation to the application being Avro serialization. I agree that using the term Avro transport is confusing and makes me cry elephant tears. OSI Layer Example Application Avro Proxy Object Presentation Avro Binary Serialization Session Avro RPC state machine Transport TCP, UDP, etc Network IP Data-link/Physical Ethernet DISCOVER: Asks the server for information about itself. It terms of vocabulary, I feel that discovery is more about finding all machines running Avro services (like bonjour or Zeroconf). The term introspection seems more appropriate here. Aside from introspection, we also need a simple Avro "ping" service. Using the base schema above, we could have a convention that says All service numbers less than zero are reserved for Avro use (for discovery/introspection, ping, etc) All service numbers greater than zero are for user-defined services. Service number zero is the ping service.
          Hide
          Todd Lipcon added a comment -

          Todd, so there's a separate command to initiate a call, and to send each chunk of its data. Is there also a command that terminates a call, i.e., declares that its request or response has no more frames?

          Something like that... Philip, Matt, and I are going to try to brainstorm a bit in the next couple of days and write up some kind of proposal to start a conversation about specifics (the above is intentionally vague)

          Show
          Todd Lipcon added a comment - Todd, so there's a separate command to initiate a call, and to send each chunk of its data. Is there also a command that terminates a call, i.e., declares that its request or response has no more frames? Something like that... Philip, Matt, and I are going to try to brainstorm a bit in the next couple of days and write up some kind of proposal to start a conversation about specifics (the above is intentionally vague)
          Hide
          Doug Cutting added a comment -

          Todd, so there's a separate command to initiate a call, and to send each chunk of its data. Is there also a command that terminates a call, i.e., declares that its request or response has no more frames?

          Also, the term "Record" above is confusing. And "Data" is overloaded and "Frame" might be used instead.

          Show
          Doug Cutting added a comment - Todd, so there's a separate command to initiate a call, and to send each chunk of its data. Is there also a command that terminates a call, i.e., declares that its request or response has no more frames? Also, the term "Record" above is confusing. And "Data" is overloaded and "Frame" might be used instead.
          Hide
          Todd Lipcon added a comment -

          I agree that the "payload" should not itself be avro-encapsulated. Instead, I think the stream would look something like:

          client:
          SendCommandRecord for call Foo, callid 1
          CommandData for callid 1, length 234 bytes
          <param 1>
          <param 2>
          <param 3>

          server:
          SendResponseRecord for callid 1
          ResponseData for callid 1, length 12 bytes
          <frame 1 of response>
          ResponseData for callid 2, length 24 bytes
          <frame 2 of response>

          (this is obviously not fleshed out - just trying to describe how I think interleaving of avro "control" records could be done with the user-protocol "payload" records.)

          Show
          Todd Lipcon added a comment - I agree that the "payload" should not itself be avro-encapsulated. Instead, I think the stream would look something like: client: SendCommandRecord for call Foo, callid 1 CommandData for callid 1, length 234 bytes <param 1> <param 2> <param 3> server: SendResponseRecord for callid 1 ResponseData for callid 1, length 12 bytes <frame 1 of response> ResponseData for callid 2, length 24 bytes <frame 2 of response> (this is obviously not fleshed out - just trying to describe how I think interleaving of avro "control" records could be done with the user-protocol "payload" records.)
          Hide
          Doug Cutting added a comment -

          > First off, we should probably call this a "protocol".

          It's fair to use a term differently in different contexts. Avro is not using terms in accord with the OSI model. Avro's spec does try to define each term when its first used, and to use terms consistently within that document. Can't we use the same terminology here? If we'd like to switch terminology, then we should also update the spec and the implementations, no?

          > I'd like us to consider describing Avro's protocol in terms of (and here the terminology falls down) an Avro protocol, or at least in terms of Avro records.

          I think this could be deceptive if we don't support Avro versioning features here. I think it's often simpler to bootstrap a system not using itself but using a more primitive system. We'd like the transport to be extensible. Many protocols use a combination of named commands, key/value meta-data and a binary payload to provide extensibility. A description at this level in Avro is possible, but would probably not be able to describe the internal structure of every command's payload. So, in practice, folks might not in fact implement a so-specified transport using Avro, as that might require an extra copy of payload data. So we might stick to using Avro to describe just the command+metadata part, and leave payloads outside of this. Or we might alternately describe the command+metadata part as something as simple as <command>(<newline><name><colon><value><newline>)*<newline><payload>.

          Show
          Doug Cutting added a comment - > First off, we should probably call this a "protocol". It's fair to use a term differently in different contexts. Avro is not using terms in accord with the OSI model. Avro's spec does try to define each term when its first used, and to use terms consistently within that document. Can't we use the same terminology here? If we'd like to switch terminology, then we should also update the spec and the implementations, no? > I'd like us to consider describing Avro's protocol in terms of (and here the terminology falls down) an Avro protocol, or at least in terms of Avro records. I think this could be deceptive if we don't support Avro versioning features here. I think it's often simpler to bootstrap a system not using itself but using a more primitive system. We'd like the transport to be extensible. Many protocols use a combination of named commands, key/value meta-data and a binary payload to provide extensibility. A description at this level in Avro is possible, but would probably not be able to describe the internal structure of every command's payload. So, in practice, folks might not in fact implement a so-specified transport using Avro, as that might require an extra copy of payload data. So we might stick to using Avro to describe just the command+metadata part, and leave payloads outside of this. Or we might alternately describe the command+metadata part as something as simple as <command>(<newline><name><colon><value><newline>)*<newline><payload>.
          Hide
          Todd Lipcon added a comment -

          linking the streaming RPC call JIRA - we'll need to be conscious of that requirement when designing the wire protocol so we can interleave multiple streamed RPCs.

          Show
          Todd Lipcon added a comment - linking the streaming RPC call JIRA - we'll need to be conscious of that requirement when designing the wire protocol so we can interleave multiple streamed RPCs.
          Hide
          Philip Zeyliger added a comment -

          Does this need to be first-class in the protocol? Rather, can we reserve a namespace of CALLs that are Avro-scoped? eg org.avro.getServerMetrics(), etc? This seems like it will be less implementation work since it can share code with the rest of RPC.

          We definitely need to put in a little bit more thought in how a server can be servicing many Avro "user protocols" simultaneously, which would allow it, then. Then you might be able to say "hey, also serve the following built-in services".

          But then how do you discover whether those services are being served? I guess you could call them, and get a MethodNotFound/Supported error.

          AUTHENTICATE could be framed as a CALL as well. Though it may be difficult to integrate SASL here, it's worth exploring.

          I'd draw the line here, though. I think the "user protocol" has things to do with the service being offered, but there are commands that affect the state of the server in a way the client shouldn't be aware of. AUTHENTICATE is one. "SET_COMPRESSION_LEVEL" might be another. These affect how the server behaves for this connection in a way that's transparent to the "user protocol" implementor.

          [bootstrapping]

          Yes, we'd try never to change it? What, you don't believe me?

          Show
          Philip Zeyliger added a comment - Does this need to be first-class in the protocol? Rather, can we reserve a namespace of CALLs that are Avro-scoped? eg org.avro.getServerMetrics(), etc? This seems like it will be less implementation work since it can share code with the rest of RPC. We definitely need to put in a little bit more thought in how a server can be servicing many Avro "user protocols" simultaneously, which would allow it, then. Then you might be able to say "hey, also serve the following built-in services". But then how do you discover whether those services are being served? I guess you could call them, and get a MethodNotFound/Supported error. AUTHENTICATE could be framed as a CALL as well. Though it may be difficult to integrate SASL here, it's worth exploring. I'd draw the line here, though. I think the "user protocol" has things to do with the service being offered, but there are commands that affect the state of the server in a way the client shouldn't be aware of. AUTHENTICATE is one. "SET_COMPRESSION_LEVEL" might be another. These affect how the server behaves for this connection in a way that's transparent to the "user protocol" implementor. [bootstrapping] Yes, we'd try never to change it? What, you don't believe me?
          Hide
          Todd Lipcon added a comment -

          First off, we should probably call this a "protocol"

          For the sake of this issue, maybe we discuss "the protocol" vs the "service protocol" or "user protocol", where the former means the stuff here and the latter means the user-specified .avpr protocol?

          DISCOVER: Asks the server for information about itself

          Does this need to be first-class in the protocol? Rather, can we reserve a namespace of CALLs that are Avro-scoped? eg org.avro.getServerMetrics(), etc? This seems like it will be less implementation work since it can share code with the rest of RPC.

          Similarly, AUTHENTICATE could be framed as a CALL as well. Though it may be difficult to integrate SASL here, it's worth exploring.

          I think putting all of the request/response through the "call" mechanism will also simplify some security, request tracing, auditing, etc - it's worth being able to set up a service such that only authenticated subjects with an "ops" group could see metrics. Or, allow a "nagios" principal access to the server metrics/health but nothing else.

          Or having commands able to include subcommands

          Please explain? What's a subcommand?

          We need to support out-of-order responses and "one way" (don't wait for a response) commands.

          Strongly agree. The "streaming RPCs" (AVRO-406) are another requirement that should fit in here. To support streaming RPC along with out-of-order response, we should also be able to interleave response chunks at chunk boundaries.

          A simple approach would be to bootstrap it by sending hash(avro protocol schema), and doing much like we do with calls right now.

          So this bootstrap schema would be sent outside the scope of "send things as avro records"? Or it would just be a bootstrap record that we promise to never ever change except at major (incompatible) versions?

          Show
          Todd Lipcon added a comment - First off, we should probably call this a "protocol" For the sake of this issue, maybe we discuss "the protocol" vs the "service protocol" or "user protocol", where the former means the stuff here and the latter means the user-specified .avpr protocol? DISCOVER: Asks the server for information about itself Does this need to be first-class in the protocol? Rather, can we reserve a namespace of CALLs that are Avro-scoped? eg org.avro.getServerMetrics(), etc? This seems like it will be less implementation work since it can share code with the rest of RPC. Similarly, AUTHENTICATE could be framed as a CALL as well. Though it may be difficult to integrate SASL here, it's worth exploring. I think putting all of the request/response through the "call" mechanism will also simplify some security, request tracing, auditing, etc - it's worth being able to set up a service such that only authenticated subjects with an "ops" group could see metrics. Or, allow a "nagios" principal access to the server metrics/health but nothing else. Or having commands able to include subcommands Please explain? What's a subcommand? We need to support out-of-order responses and "one way" (don't wait for a response) commands. Strongly agree. The "streaming RPCs" ( AVRO-406 ) are another requirement that should fit in here. To support streaming RPC along with out-of-order response, we should also be able to interleave response chunks at chunk boundaries. A simple approach would be to bootstrap it by sending hash(avro protocol schema), and doing much like we do with calls right now. So this bootstrap schema would be sent outside the scope of "send things as avro records"? Or it would just be a bootstrap record that we promise to never ever change except at major (incompatible) versions?
          Hide
          Philip Zeyliger added a comment -

          I'm hijacking this thread for the description (as opposed to the title). Let's start thinking about a high-performance, secure transport for Avro.

          Here's a dump of my current thoughts on this topic, after reading up a bit on SASL, and reading through some of the the Hadoop security patches.

          First off, we should probably call this a "protocol". It's a bit tricky, since we've already got a notion of Avro protocols, but "transport" reminds people of http://en.wikipedia.org/wiki/Transport_Layer, i.e., UDP vs TCP, and that's not what we're discussing here. (On the TCP vs UDP front, let's focus our efforts first on a TCP protocol. There might be a lot of value of having a UDP protocol as well, but it's clear that we'll need a TCP one.)

          It's a bit meta, but I'd like us to consider describing Avro's protocol in terms of (and here the terminology falls down) an Avro protocol, or at least in terms of Avro records. Instead of saying "and then there shall be a long, encoded like so, and then it shall by follows by that many bytes", we should just say, and "then shall we receive a record with the following schema". We already do so in part, and I think that's the right direction. I think it will make the description of the protocol clearer, and, I think, it will let the implementation worry re-use some schema functionality. (I think implementations should use the most type-safe APIs they have available to them, but, hey, that's by definition an implementation detail.)

          In terms of the "primitives", here's what I can think of:

          • CALL; this is the work-horse of the RPC, analagous to http://hadoop.apache.org/avro/docs/1.2.0/spec.html#Call+Format. If we decide to do schema resolution at the handshake level, we would do it here. Returns the response. May throw AuthenticationRequired.
          • AUTHENTICATE: this is the command for authentication. SASL sometimes requires a back and forth (until it's "done"); we'd put the hooks for all of that here.
          • DISCOVER: Asks the server for information about itself. Specifically, servers may tell clients what protocols they support. This may throw AuthenticationRequired or return nothing, if the server wants to be cagey. This is in some sense similar to FB303: https://svn.apache.org/repos/asf/incubator/thrift/trunk/contrib/fb303/if/fb303.thrift . In a friendly environment, a server could tell you who's running it (a username), what machine it's on, arbitrary key/value statistics.

          We absolutely need to support piggy-backing of commands. One way to do that is for clients to simply be able to send multiple commands in a row, without waiting for the response. Or having commands able to include subcommands.

          We need to support out-of-order responses and "one way" (don't wait for a response) commands.

          We still need to do framing. Also, SASL requires that all bytes after the succesful SASL authentication are wrapped by SASL, so servers and clients need to have a state machine that understands that, and wraps appropriately. (We could maybe have avoided framing if we supported framing directly in Avro's string primitive type, like we do in Avro's map type, by having a negative string length indicate a string that is continued.)

          Finally, we need to think hard about how to version this protocol itself. It's appealing to be able to add commands in the future ("oneway" is an example) or to enrich the response of commands like "DISCOVER". It's noteworthy that text-based protocols like IMAP have had little trouble extending themselves to stuff like SASL, because they could just augment what existing commands did. (RFC 4959 is pretty short.) A simple approach would be to bootstrap it by sending hash(avro protocol schema), and doing much like we do with calls right now.

          Anyway, that's where I am right now. Looking forward to more discussion.

          – Philip

          Show
          Philip Zeyliger added a comment - I'm hijacking this thread for the description (as opposed to the title). Let's start thinking about a high-performance, secure transport for Avro. Here's a dump of my current thoughts on this topic, after reading up a bit on SASL, and reading through some of the the Hadoop security patches. First off, we should probably call this a "protocol". It's a bit tricky, since we've already got a notion of Avro protocols, but "transport" reminds people of http://en.wikipedia.org/wiki/Transport_Layer , i.e., UDP vs TCP, and that's not what we're discussing here. (On the TCP vs UDP front, let's focus our efforts first on a TCP protocol. There might be a lot of value of having a UDP protocol as well, but it's clear that we'll need a TCP one.) It's a bit meta, but I'd like us to consider describing Avro's protocol in terms of (and here the terminology falls down) an Avro protocol, or at least in terms of Avro records. Instead of saying "and then there shall be a long, encoded like so, and then it shall by follows by that many bytes", we should just say, and "then shall we receive a record with the following schema". We already do so in part, and I think that's the right direction. I think it will make the description of the protocol clearer, and, I think, it will let the implementation worry re-use some schema functionality. (I think implementations should use the most type-safe APIs they have available to them, but, hey, that's by definition an implementation detail.) In terms of the "primitives", here's what I can think of: CALL; this is the work-horse of the RPC, analagous to http://hadoop.apache.org/avro/docs/1.2.0/spec.html#Call+Format . If we decide to do schema resolution at the handshake level, we would do it here. Returns the response. May throw AuthenticationRequired. AUTHENTICATE: this is the command for authentication. SASL sometimes requires a back and forth (until it's "done"); we'd put the hooks for all of that here. DISCOVER: Asks the server for information about itself. Specifically, servers may tell clients what protocols they support. This may throw AuthenticationRequired or return nothing, if the server wants to be cagey. This is in some sense similar to FB303: https://svn.apache.org/repos/asf/incubator/thrift/trunk/contrib/fb303/if/fb303.thrift . In a friendly environment, a server could tell you who's running it (a username), what machine it's on, arbitrary key/value statistics. We absolutely need to support piggy-backing of commands. One way to do that is for clients to simply be able to send multiple commands in a row, without waiting for the response. Or having commands able to include subcommands. We need to support out-of-order responses and "one way" (don't wait for a response) commands. We still need to do framing. Also, SASL requires that all bytes after the succesful SASL authentication are wrapped by SASL, so servers and clients need to have a state machine that understands that, and wraps appropriately. (We could maybe have avoided framing if we supported framing directly in Avro's string primitive type, like we do in Avro's map type, by having a negative string length indicate a string that is continued.) Finally, we need to think hard about how to version this protocol itself. It's appealing to be able to add commands in the future ("oneway" is an example) or to enrich the response of commands like "DISCOVER". It's noteworthy that text-based protocols like IMAP have had little trouble extending themselves to stuff like SASL, because they could just augment what existing commands did. (RFC 4959 is pretty short.) A simple approach would be to bootstrap it by sending hash(avro protocol schema), and doing much like we do with calls right now. Anyway, that's where I am right now. Looking forward to more discussion. – Philip
          Hide
          Doug Cutting added a comment -

          > Regarding the use of "avro:" for urls; perhaps "avros", to leave room for a tcp sockets-based "avro:" url?

          I would like to minimize the number of transports, as they combinatorially explode the compatibility matrix. Rather we should devise a single transport that is both capable of high performance and supports authentication and encryption. The authentication and encryption should be optional, as they are in SASL. We'll want to be able to piggyback the initial SASL handshake on the initial request response, so that services which do not use authentication or encryption pay little penalty.

          Show
          Doug Cutting added a comment - > Regarding the use of "avro:" for urls; perhaps "avros", to leave room for a tcp sockets-based "avro:" url? I would like to minimize the number of transports, as they combinatorially explode the compatibility matrix. Rather we should devise a single transport that is both capable of high performance and supports authentication and encryption. The authentication and encryption should be optional, as they are in SASL. We'll want to be able to piggyback the initial SASL handshake on the initial request response, so that services which do not use authentication or encryption pay little penalty.
          Hide
          Jeff Hammerbacher added a comment -

          Regarding the use of "avro:" for urls; perhaps "avros", to leave room for a tcp sockets-based "avro:" url?

          Show
          Jeff Hammerbacher added a comment - Regarding the use of "avro:" for urls; perhaps "avros", to leave room for a tcp sockets-based "avro:" url?
          Hide
          Doug Cutting added a comment -

          This transport should be suitable for use by Hadoop. In particular, implementations should be able to use it to security communicate with Hadoop clusters.

          Hadoop plans to use SASL for authentication. Thus this transport should too.

          Show
          Doug Cutting added a comment - This transport should be suitable for use by Hadoop. In particular, implementations should be able to use it to security communicate with Hadoop clusters. Hadoop plans to use SASL for authentication. Thus this transport should too.

            People

            • Assignee:
              Unassigned
              Reporter:
              Doug Cutting
            • Votes:
              1 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:

                Development