Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.1
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Java's socket transport is non-standard (not in the Avro spec) but might serve as a prototype of a future standard transport (AVRO-341).

      It would be useful to extend it to support SASL-based authentication and encryption.

      1. AVRO-641.patch
        16 kB
        Doug Cutting
      2. AVRO-641.patch
        21 kB
        Doug Cutting
      3. AVRO-641.patch
        21 kB
        Doug Cutting
      4. AVRO-641.patch
        24 kB
        Doug Cutting
      5. AVRO-641.patch
        26 kB
        Doug Cutting
      6. AVRO-641.patch
        36 kB
        Doug Cutting
      7. AVRO-641.patch
        36 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          First version of patch, that doesn't quite work.

          Show
          Doug Cutting added a comment - First version of patch, that doesn't quite work.
          Hide
          Doug Cutting added a comment -

          Here's a working version, with tests.

          I included implementations of the anonymous SASL mechanism, not provided by Oracle Java. This is used by default. My long-term intent is that Avro's socket-based transport will always use SASL, and that applications which do not require authentication or encryption will use the anonymous mechanism. So eventually we'll remove SocketTransceiver and only have SaslSocketTransceiver.

          Show
          Doug Cutting added a comment - Here's a working version, with tests. I included implementations of the anonymous SASL mechanism, not provided by Oracle Java. This is used by default. My long-term intent is that Avro's socket-based transport will always use SASL, and that applications which do not require authentication or encryption will use the anonymous mechanism. So eventually we'll remove SocketTransceiver and only have SaslSocketTransceiver.
          Hide
          Doug Cutting added a comment -

          New version of patch with most info-level logging switched to debug level.

          Show
          Doug Cutting added a comment - New version of patch with most info-level logging switched to debug level.
          Hide
          Todd Lipcon added a comment -

          Doesn't always using SASL result in an extra roundtrip at connection startup? Seems way too expensive to be worth the elegance.

          Show
          Todd Lipcon added a comment - Doesn't always using SASL result in an extra roundtrip at connection startup? Seems way too expensive to be worth the elegance.
          Hide
          Doug Cutting added a comment -

          > Doesn't always using SASL result in an extra roundtrip at connection startup?

          No, it's piggybacked. From the RFC:

          The mechanism consists of a single message from the client to the server.

          Before the first request the client sends the SASL anonymous message, but it does not block waiting for a response, instead it directly proceeds to write the request. The anonymous message has no response and SASL is complete once that message has been sent. If the server does not accept the anonymous mechanism, it closes the connection and logs an error. When a client uses the anonymous mechanism we should probably catch EOFException and suggest that there may have been an authentication problem.

          Show
          Doug Cutting added a comment - > Doesn't always using SASL result in an extra roundtrip at connection startup? No, it's piggybacked. From the RFC: The mechanism consists of a single message from the client to the server. Before the first request the client sends the SASL anonymous message, but it does not block waiting for a response, instead it directly proceeds to write the request. The anonymous message has no response and SASL is complete once that message has been sent. If the server does not accept the anonymous mechanism, it closes the connection and logs an error. When a client uses the anonymous mechanism we should probably catch EOFException and suggest that there may have been an authentication problem.
          Hide
          Todd Lipcon added a comment -

          Ooh, that's clever

          Show
          Todd Lipcon added a comment - Ooh, that's clever
          Hide
          Doug Cutting added a comment -

          Here's a new version that throws AuthenticationException when the anonymous mechanism fails, including a test for this situation.

          Show
          Doug Cutting added a comment - Here's a new version that throws AuthenticationException when the anonymous mechanism fails, including a test for this situation.
          Hide
          Doug Cutting added a comment -

          New version that also throws AuthenticationException when non-anonymous authentication fails, and a test that this is the case.

          Show
          Doug Cutting added a comment - New version that also throws AuthenticationException when non-anonymous authentication fails, and a test that this is the case.
          Hide
          Aaron T. Myers added a comment -

          This implementation is not quite a valid SASL protocol. From http://www.ietf.org/rfc/rfc2222.txt, a valid SASL protocol must provide:

          A definition of the command to initiate the authentication protocol exchange. This command must have as a parameter the mechanism name being selected by the client.

          This is so that a single server providing SASL authentication may support multiple underlying security mechanisms. Whether or not you make a single Avro server support multiple mechanisms, it would probably be useful to send the mechanism name you intend to use, so this can be checked and rejected in the event of mismatch.

          The SASL spec also requires that a SASL protocol define "how the server indicates completion or failure of the exchange". In the current Avro implementation, failure is indicated by disconnecting the socket. Completion isn't really indicated, except by proceeding to operate normally. It would probably be useful to define this precisely, especially when implementing Avro SASL support for other languages.

          The protocol which I wrote for Thrift (https://issues.apache.org/jira/browse/THRIFT-876) does this, as well as adding a definition of how to send additional status data in the event of failure. It's modeled after the IMAP SASL protocol (http://www.ietf.org/rfc/rfc3501.txt). This spec would probably need a change to be useful in Avro, as I don't believe that it could support the anonymous protocol as-written without doing an additional round trip.

          Show
          Aaron T. Myers added a comment - This implementation is not quite a valid SASL protocol. From http://www.ietf.org/rfc/rfc2222.txt , a valid SASL protocol must provide: A definition of the command to initiate the authentication protocol exchange. This command must have as a parameter the mechanism name being selected by the client. This is so that a single server providing SASL authentication may support multiple underlying security mechanisms. Whether or not you make a single Avro server support multiple mechanisms, it would probably be useful to send the mechanism name you intend to use, so this can be checked and rejected in the event of mismatch. The SASL spec also requires that a SASL protocol define "how the server indicates completion or failure of the exchange". In the current Avro implementation, failure is indicated by disconnecting the socket. Completion isn't really indicated, except by proceeding to operate normally. It would probably be useful to define this precisely, especially when implementing Avro SASL support for other languages. The protocol which I wrote for Thrift ( https://issues.apache.org/jira/browse/THRIFT-876 ) does this, as well as adding a definition of how to send additional status data in the event of failure. It's modeled after the IMAP SASL protocol ( http://www.ietf.org/rfc/rfc3501.txt ). This spec would probably need a change to be useful in Avro, as I don't believe that it could support the anonymous protocol as-written without doing an additional round trip.
          Hide
          Doug Cutting added a comment -

          Thanks for the comments, Aaron. As you know, I copied much of the implementation here from your Thrift implementation. Thanks for that too!

          Based on your feedback, I intend to improve correctness of the SASL implementation by:

          • add a status code to each message (as you've done for Thrift)
          • add the SASL mechanism name to the first, status=START message (as you suggest)

          Further, things can be optimized in at least a few places:

          • I am double-framing the payload data and think I can eliminate that.
          • writes can be gathered, to use fewer system calls & packets

          Finally I need to add some documentation before this can be committed.

          Show
          Doug Cutting added a comment - Thanks for the comments, Aaron. As you know, I copied much of the implementation here from your Thrift implementation. Thanks for that too! Based on your feedback, I intend to improve correctness of the SASL implementation by: add a status code to each message (as you've done for Thrift) add the SASL mechanism name to the first, status=START message (as you suggest) Further, things can be optimized in at least a few places: I am double-framing the payload data and think I can eliminate that. writes can be gathered, to use fewer system calls & packets Finally I need to add some documentation before this can be committed.
          Hide
          Doug Cutting added a comment -

          Here's a new version of this patch that I believe is ready to commit.

          Changes include:

          • add a status code to each message in the negotiation
          • add the SASL mechanism name to the first negotiation message
          • transmit a UTF-8 error message when negotiation fails
          • stop double-framing transmitted data
          • optimize framing of non-QOP data at write, merging small frames, reducing packets and read system calls
          • use gathered writes, potentially reducing write system calls
          • add a document describing Avro's SASL profile
          Show
          Doug Cutting added a comment - Here's a new version of this patch that I believe is ready to commit. Changes include: add a status code to each message in the negotiation add the SASL mechanism name to the first negotiation message transmit a UTF-8 error message when negotiation fails stop double-framing transmitted data optimize framing of non-QOP data at write, merging small frames, reducing packets and read system calls use gathered writes, potentially reducing write system calls add a document describing Avro's SASL profile
          Hide
          Aaron T. Myers added a comment -

          This patch looks pretty solid, Doug. I may use some of it as inspiration to improve the Thrift SASL implementation.

          A few comments:

          • The SaslSocketTransceiver.transceive(...) method declaration should have a space after the return type.
          • In SaslSocketTransceiver, it seems like saslResponsePiggybacked can potentially be read before its initialized. Should probably explicitly set a default value.
          • Spelling in the "Overview" section of the protocol spec: "suceeds" -> "succeeds".
          • In the "Negotiation" section of the protocol spec: I don't believe the mechanism name can be UTF-8. I'm pretty sure mechanism names are required to be a subset of ASCII, as defined in http://www.faqs.org/rfcs/rfc4422.html. Further, the mechanism name can only be up to 20 characters, so no need for 4-bytes to specify the length of the mechanism name.
          • In the "Negotiation" section of the protocol spec: Is it really the case that "Once EITHER [capitalization mine] the client or server send a COMPLETE message then negotiation has completed successfully" ? i.e. should it not be "both" ?
          • In the "Session Data" section of the protocol spec: You make reference to "steps 5 and 6" but there is no previous numbering of steps in the protocol.
          Show
          Aaron T. Myers added a comment - This patch looks pretty solid, Doug. I may use some of it as inspiration to improve the Thrift SASL implementation. A few comments: The SaslSocketTransceiver.transceive(...) method declaration should have a space after the return type. In SaslSocketTransceiver, it seems like saslResponsePiggybacked can potentially be read before its initialized. Should probably explicitly set a default value. Spelling in the "Overview" section of the protocol spec: "suceeds" -> "succeeds". In the "Negotiation" section of the protocol spec: I don't believe the mechanism name can be UTF-8. I'm pretty sure mechanism names are required to be a subset of ASCII, as defined in http://www.faqs.org/rfcs/rfc4422.html . Further, the mechanism name can only be up to 20 characters, so no need for 4-bytes to specify the length of the mechanism name. In the "Negotiation" section of the protocol spec: Is it really the case that "Once EITHER [capitalization mine] the client or server send a COMPLETE message then negotiation has completed successfully" ? i.e. should it not be "both" ? In the "Session Data" section of the protocol spec: You make reference to "steps 5 and 6" but there is no previous numbering of steps in the protocol.
          Hide
          Doug Cutting added a comment -

          Thanks for the detailed review, Aaron!

          > The SaslSocketTransceiver.transceive(...) method declaration should have a space after the return type.

          Fixed.

          > In SaslSocketTransceiver, it seems like saslResponsePiggybacked can potentially be read before its initialized. Should probably explicitly set a default value.

          Java fields, unlike variables, always have a default value. The default for a boolean is false.

          > Spelling in the "Overview" section of the protocol spec: "suceeds" -> "succeeds".

          Fixed.

          > In the "Negotiation" section of the protocol spec: I don't believe the mechanism name can be UTF-8. I'm pretty sure mechanism names are required to be a subset of ASCII, as defined in http://www.faqs.org/rfcs/rfc4422.html. Further, the mechanism name can only be up to 20 characters, so no need for 4-bytes to specify the length of the mechanism name.

          UTF-8 and ASCII encode the characters permitted in mechanism names identically. But specifying UTF-8 is confusing, so I've removed that from the document.

          I slightly prefer to leave the mechansim length at 4-bytes, since it simplifies implementation. At each step in the negotiation one always reads the status byte followed by a 4-byte length-prefixed payload. (In the case of the START message, one must read an additional, similarly formatted payload.) So not only would I need to add new methods to read and write one-byte length-prefixed strings, but I'd also have to separately read the payload of each message. If we really wanted to minimize bytes, we might also change the length of a FAIL message to be two bytes. Should I worry more about these wasted bytes?

          > In the "Negotiation" section of the protocol spec: Is it really the case that "Once EITHER [capitalization mine] the client or server send a COMPLETE message then negotiation has completed successfully" ? i.e. should it not be "both" ?

          I believe that when one side sends COMPLETE the other side does not respond, that a single COMPLETE terminates negotiation. I think that's what I've implemented. Without this, implementing an anonymous mechanism without adding a round-trip would be trickier.

          > In the "Session Data" section of the protocol spec: You make reference to "steps 5 and 6" but there is no previous numbering of steps in the protocol.

          Fixed. I wonder where that strange text came from?

          Show
          Doug Cutting added a comment - Thanks for the detailed review, Aaron! > The SaslSocketTransceiver.transceive(...) method declaration should have a space after the return type. Fixed. > In SaslSocketTransceiver, it seems like saslResponsePiggybacked can potentially be read before its initialized. Should probably explicitly set a default value. Java fields, unlike variables, always have a default value. The default for a boolean is false. > Spelling in the "Overview" section of the protocol spec: "suceeds" -> "succeeds". Fixed. > In the "Negotiation" section of the protocol spec: I don't believe the mechanism name can be UTF-8. I'm pretty sure mechanism names are required to be a subset of ASCII, as defined in http://www.faqs.org/rfcs/rfc4422.html . Further, the mechanism name can only be up to 20 characters, so no need for 4-bytes to specify the length of the mechanism name. UTF-8 and ASCII encode the characters permitted in mechanism names identically. But specifying UTF-8 is confusing, so I've removed that from the document. I slightly prefer to leave the mechansim length at 4-bytes, since it simplifies implementation. At each step in the negotiation one always reads the status byte followed by a 4-byte length-prefixed payload. (In the case of the START message, one must read an additional, similarly formatted payload.) So not only would I need to add new methods to read and write one-byte length-prefixed strings, but I'd also have to separately read the payload of each message. If we really wanted to minimize bytes, we might also change the length of a FAIL message to be two bytes. Should I worry more about these wasted bytes? > In the "Negotiation" section of the protocol spec: Is it really the case that "Once EITHER [capitalization mine] the client or server send a COMPLETE message then negotiation has completed successfully" ? i.e. should it not be "both" ? I believe that when one side sends COMPLETE the other side does not respond, that a single COMPLETE terminates negotiation. I think that's what I've implemented. Without this, implementing an anonymous mechanism without adding a round-trip would be trickier. > In the "Session Data" section of the protocol spec: You make reference to "steps 5 and 6" but there is no previous numbering of steps in the protocol. Fixed. I wonder where that strange text came from?
          Hide
          Aaron T. Myers added a comment -

          In case it wasn't clear, +1.

          Java fields, unlike variables, always have a default value. The default for a boolean is false.

          Sure, I just think it's better to be explicit. The only thing you lose is 8 characters in the .java file.

          UTF-8 and ASCII encode the characters permitted in mechanism names identically. But specifying UTF-8 is confusing, so I've removed that from the document.

          Certainly true that there's no difference in encoding, but allowing arbitrary UTF-8 mechanism names is overly-broad. In fact, allowing arbitrary ASCII would be overly-broad. I think it's wise to remove the mention of how mechanism names are encoded, since there's no need to reproduce the SASL specification in this document.

          I slightly prefer to leave the mechansim length at 4-bytes, since it simplifies implementation.

          Totally agree. Just pointing it out, in case you were concerned about compactness.

          I believe that when one side sends COMPLETE the other side does not respond, that a single COMPLETE terminates negotiation.

          Upon further review, I agree with you.

          I wonder where that strange text came from?

          Quite curious indeed.

          Show
          Aaron T. Myers added a comment - In case it wasn't clear, +1. Java fields, unlike variables, always have a default value. The default for a boolean is false. Sure, I just think it's better to be explicit. The only thing you lose is 8 characters in the .java file. UTF-8 and ASCII encode the characters permitted in mechanism names identically. But specifying UTF-8 is confusing, so I've removed that from the document. Certainly true that there's no difference in encoding, but allowing arbitrary UTF-8 mechanism names is overly-broad. In fact, allowing arbitrary ASCII would be overly-broad. I think it's wise to remove the mention of how mechanism names are encoded, since there's no need to reproduce the SASL specification in this document. I slightly prefer to leave the mechansim length at 4-bytes, since it simplifies implementation. Totally agree. Just pointing it out, in case you were concerned about compactness. I believe that when one side sends COMPLETE the other side does not respond, that a single COMPLETE terminates negotiation. Upon further review, I agree with you. I wonder where that strange text came from? Quite curious indeed.
          Hide
          Doug Cutting added a comment -

          > The only thing you lose is 8 characters in the .java file.

          Because I enjoy beating dead horses, I note that the .class file also gains a few bytes. The java compiler, for whatever reason, doesn't know that the default value for booleans is false, that the default value for ints is 0, etc., and generates code in the default constructor for these. In particular, initializing a boolean to false adds the following three instructions to the constructor bytecode:

          aload_0
          iconst_0
          putfield
          

          But, to your point, that won't make any significant difference.

          Thanks again for your reviews, and for all the code & text I copied from your Thrift implementation!

          Show
          Doug Cutting added a comment - > The only thing you lose is 8 characters in the .java file. Because I enjoy beating dead horses, I note that the .class file also gains a few bytes. The java compiler, for whatever reason, doesn't know that the default value for booleans is false, that the default value for ints is 0, etc., and generates code in the default constructor for these. In particular, initializing a boolean to false adds the following three instructions to the constructor bytecode: aload_0 iconst_0 putfield But, to your point, that won't make any significant difference. Thanks again for your reviews, and for all the code & text I copied from your Thrift implementation!
          Hide
          Doug Cutting added a comment -

          I just committed this.

          Show
          Doug Cutting added a comment - I just committed this.

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Doug Cutting
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development