Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 2.0.0-alpha
    • Component/s: None
    • Labels:
      None
    1. ipcHeader10.patch
      29 kB
      Sanjay Radia
    2. ipcHeader9.patch
      27 kB
      Sanjay Radia
    3. ipcHeader9.patch
      27 kB
      Sanjay Radia
    4. ipcHeader8.patch
      28 kB
      Sanjay Radia
    5. ipcHeader7.patch
      28 kB
      Sanjay Radia
    6. ipcHeader5.patch
      32 kB
      Sanjay Radia
    7. ipcHeader4.patch
      31 kB
      Sanjay Radia
    8. ipcHeader3.patch
      30 kB
      Sanjay Radia
    9. HADOOP-7557.patch
      7 kB
      Doug Cutting
    10. ipcHeader2.patch
      75 kB
      Sanjay Radia
    11. ipcHeader1.patch
      66 kB
      Sanjay Radia
    12. IpcHeader.proto
      0.3 kB
      Sanjay Radia

      Issue Links

        Activity

        Hide
        Sanjay Radia added a comment -

        Proposal: define the headers in Protocol Buffers.

        Show
        Sanjay Radia added a comment - Proposal: define the headers in Protocol Buffers.
        Hide
        Todd Lipcon added a comment -

        Sounds good to me. I am pro protobuf.

        Show
        Todd Lipcon added a comment - Sounds good to me. I am pro protobuf.
        Hide
        Doug Cutting added a comment -

        Extensibility in internet protocols (SMTP, DNS, LDAP, HTTP, SASL Digest, etc.) is usually provided through lists of key/value pairs. All implementations do not require any particular library. Protocol specifications are kept more independent.

        It would be good if our protocol metadata is easily interconvertable with other network protocols, e.g., if Hadoop RPC metadata can be easily copied into HTTP headers and/or Avro RPC metadata and vice versa.

        So I would advocate that extensibility be provided through the well-known protocol extensibility mechanism of a list of key/value pairs.

        Do we have a case where this is inadequate?

        Show
        Doug Cutting added a comment - Extensibility in internet protocols (SMTP, DNS, LDAP, HTTP, SASL Digest, etc.) is usually provided through lists of key/value pairs. All implementations do not require any particular library. Protocol specifications are kept more independent. It would be good if our protocol metadata is easily interconvertable with other network protocols, e.g., if Hadoop RPC metadata can be easily copied into HTTP headers and/or Avro RPC metadata and vice versa. So I would advocate that extensibility be provided through the well-known protocol extensibility mechanism of a list of key/value pairs. Do we have a case where this is inadequate?
        Hide
        Todd Lipcon added a comment -

        IMO key-value pairs as used in HTTP/etc are not appropriate for RPC due to their lack of compactness. A typical HTTP payload is many KB, so a multi-hundred-byte header seems fine. A typical Hadoop RPC payload may only be a few bytes, so a lengthy header ("X-Hadoop-RPC-Version: 1.0") is a lot of overhead. The obvious solution is to have well-defined numeric tags for each header maintained centrally, and then just use those tags instead. And that's basically protocol buffers

        Show
        Todd Lipcon added a comment - IMO key-value pairs as used in HTTP/etc are not appropriate for RPC due to their lack of compactness. A typical HTTP payload is many KB, so a multi-hundred-byte header seems fine. A typical Hadoop RPC payload may only be a few bytes, so a lengthy header ("X-Hadoop-RPC-Version: 1.0") is a lot of overhead. The obvious solution is to have well-defined numeric tags for each header maintained centrally, and then just use those tags instead. And that's basically protocol buffers
        Hide
        Doug Cutting added a comment -

        Protocol buffers are a lot more than just that. If that's all that's required, then the protocol specification can define constants for these, e.g.:

        [16-bit key] [16-bit value_length] [value]

        A particular key constant can be reserved for "dynamic" parameters, e.g.,

        [ key=OTHER] [ 16-bit payload_length] [16-bit key_length] [key] [value]

        (The length of the value is payload_length - key_length.)

        A "dynamic" parameter that's used a lot could get incorporated into the next version of the protocol specification.

        I also question whether the size of RPC headers is a significant factor in Hadoop performance. Even when the payload is only a few bytes, the dominant cost is not network bandwidth nor request parsing.

        Show
        Doug Cutting added a comment - Protocol buffers are a lot more than just that. If that's all that's required, then the protocol specification can define constants for these, e.g.: [16-bit key] [16-bit value_length] [value] A particular key constant can be reserved for "dynamic" parameters, e.g., [ key=OTHER] [ 16-bit payload_length] [16-bit key_length] [key] [value] (The length of the value is payload_length - key_length.) A "dynamic" parameter that's used a lot could get incorporated into the next version of the protocol specification. I also question whether the size of RPC headers is a significant factor in Hadoop performance. Even when the payload is only a few bytes, the dominant cost is not network bandwidth nor request parsing.
        Hide
        Todd Lipcon added a comment -

        Yea, I guess PBs also have type information. But type information means the value can be much shorter as well.

        I also question whether the size of RPC headers is a significant factor in Hadoop performance. Even when the payload is only a few bytes, the dominant cost is not network bandwidth nor request parsing.

        In Hadoop job performance, I agree. In something like HBase, though, it is a significant amount of the time spent when requests are doing small reads out of cache. Currently HBase has a fork of Hadoop IPC, but it would be good to keep that use case in mind in hopes that down the road, we can converge implementations again.

        It seems the easiest way to settle the performance debate is to do a benchmark?

        Show
        Todd Lipcon added a comment - Yea, I guess PBs also have type information. But type information means the value can be much shorter as well. I also question whether the size of RPC headers is a significant factor in Hadoop performance. Even when the payload is only a few bytes, the dominant cost is not network bandwidth nor request parsing. In Hadoop job performance, I agree. In something like HBase, though, it is a significant amount of the time spent when requests are doing small reads out of cache. Currently HBase has a fork of Hadoop IPC, but it would be good to keep that use case in mind in hopes that down the road, we can converge implementations again. It seems the easiest way to settle the performance debate is to do a benchmark?
        Hide
        Doug Cutting added a comment -

        Todd, if you have a complex extension value then you might use PB to encode that value, but I think we ought to avoid PB for the top-level key/value pair extension mechanism. Our network protocol should be something we'd like to see as an Internet RFC. Should RFC's include PB-defined structs?

        Show
        Doug Cutting added a comment - Todd, if you have a complex extension value then you might use PB to encode that value, but I think we ought to avoid PB for the top-level key/value pair extension mechanism. Our network protocol should be something we'd like to see as an Internet RFC. Should RFC's include PB-defined structs?
        Hide
        Sanjay Radia added a comment -

        Hadoop RPC protocols will need to use PB or Avro and hence no chance of being an RFC. I think it is okay for Hadoop's IPC header to be in PB or Avro.

        Show
        Sanjay Radia added a comment - Hadoop RPC protocols will need to use PB or Avro and hence no chance of being an RFC. I think it is okay for Hadoop's IPC header to be in PB or Avro.
        Hide
        Sanjay Radia added a comment -

        Attached is a draft IpcHeader in PB.

        Show
        Sanjay Radia added a comment - Attached is a draft IpcHeader in PB.
        Hide
        Doug Cutting added a comment -

        I think PB provides little advantage here and complicates specification of a wire protocol. I'd hope our long-term plan is to move towards a standard wire protocol, either adopting an existing standard protocol or developing our own interoperable, well-specified standard. The PB file attached is equivalent to defining three string constant keys whose values are to be interpreted as strings in RFC-822-style headers. Or these constants might be assigned numeric values in the protocol specification, as they are in PB, although that might complicate embedding this protocol in other standard protocols.

        Show
        Doug Cutting added a comment - I think PB provides little advantage here and complicates specification of a wire protocol. I'd hope our long-term plan is to move towards a standard wire protocol, either adopting an existing standard protocol or developing our own interoperable, well-specified standard. The PB file attached is equivalent to defining three string constant keys whose values are to be interpreted as strings in RFC-822-style headers. Or these constants might be assigned numeric values in the protocol specification, as they are in PB, although that might complicate embedding this protocol in other standard protocols.
        Hide
        Sanjay Radia added a comment -
        • One can define interoperable well-specified standards in several ways including using PB.
        • Interoperable well-specified standards have to be defined not just for the connection header but for the HDFS and MR protocols. I am sure you are not suggesting that we drop the use of tools like PB and Avro completely at all layers.
        • Given that we are going to use PB/Avro at upper layers, we don't want to make a special case for the connection header. PB makes it easy to extend the headers. Yes I know we can do this manually but this is much easier with PB. Indeed one can argue that all of what PB and Avro do can be manually done and have been manually done in various protocols for the last 30 years.
        Show
        Sanjay Radia added a comment - One can define interoperable well-specified standards in several ways including using PB. Interoperable well-specified standards have to be defined not just for the connection header but for the HDFS and MR protocols. I am sure you are not suggesting that we drop the use of tools like PB and Avro completely at all layers. Given that we are going to use PB/Avro at upper layers, we don't want to make a special case for the connection header. PB makes it easy to extend the headers. Yes I know we can do this manually but this is much easier with PB. Indeed one can argue that all of what PB and Avro do can be manually done and have been manually done in various protocols for the last 30 years.
        Hide
        Doug Cutting added a comment -

        > PB makes it easy to extend the headers.

        Map<String,ByteBuffer> is naturally extensible too.

        It is a mistake to use a particular serialization system to specify a container for serialized data that's meant to be independent of any particular serialization system.

        Show
        Doug Cutting added a comment - > PB makes it easy to extend the headers. Map<String,ByteBuffer> is naturally extensible too. It is a mistake to use a particular serialization system to specify a container for serialized data that's meant to be independent of any particular serialization system.
        Hide
        Sanjay Radia added a comment -

        Patch attached.

        Show
        Sanjay Radia added a comment - Patch attached.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12498224/ipcHeader1.patch
        against trunk revision .

        -1 @author. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/277//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/277//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/277//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498224/ipcHeader1.patch against trunk revision . -1 @author. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/277//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/277//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/277//console This message is automatically generated.
        Hide
        Steve Loughran added a comment -

        One thing I'd like to see is IPC messages including a CRC32 checksum, to compensate for the limitations of the TCP checksum. Ideally it should go as a footer (some papers argue for this in terms of creation, validation and to catch two messages merged to each other. If a footer can't be done, it would be nice to somehow sneak in the checksum here. Would that be possible?

        Show
        Steve Loughran added a comment - One thing I'd like to see is IPC messages including a CRC32 checksum, to compensate for the limitations of the TCP checksum. Ideally it should go as a footer (some papers argue for this in terms of creation, validation and to catch two messages merged to each other. If a footer can't be done, it would be nice to somehow sneak in the checksum here. Would that be possible?
        Hide
        Sanjay Radia added a comment -

        Please hold off reviewing the patch in detail - i haven't kind finished the one small part.

        Show
        Sanjay Radia added a comment - Please hold off reviewing the patch in detail - i haven't kind finished the one small part.
        Hide
        Sanjay Radia added a comment -

        Updated patch.

        Show
        Sanjay Radia added a comment - Updated patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12498265/ipcHeader2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/279//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/279//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/279//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498265/ipcHeader2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/279//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/279//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/279//console This message is automatically generated.
        Hide
        Doug Cutting added a comment -

        Your patch does more than just make the header extensible. It also adds new stuff into the header, etc.

        Here's a considerably simpler patch that only changes the wire format of the header to be something that fields can be easily added to and removed from.

        Show
        Doug Cutting added a comment - Your patch does more than just make the header extensible. It also adds new stuff into the header, etc. Here's a considerably simpler patch that only changes the wire format of the header to be something that fields can be easily added to and removed from.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12498683/HADOOP-7557.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/287//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/287//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498683/HADOOP-7557.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/287//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/287//console This message is automatically generated.
        Hide
        Sanjay Radia added a comment -

        The only extra field added was the protocolKind. Our RPC headers are not well designed.
        ProtocolKind is necessary since the next layer currently supports two different serializations and can support additional ones.

        As far as smaller patch – that is mostly due to auto generated files from PB. HDFS, since Todd's data transfer PB stuff, has chosen to commit the PB generated files rather than generate them via mvn.

        If I were to add a new field to the main header (the "hrpc" one) to indicate the format of the next layer would it satisfy your concerns? It would allow one to write all the layers in Writables, or PB or Avro.

        Show
        Sanjay Radia added a comment - The only extra field added was the protocolKind. Our RPC headers are not well designed. ProtocolKind is necessary since the next layer currently supports two different serializations and can support additional ones. As far as smaller patch – that is mostly due to auto generated files from PB. HDFS, since Todd's data transfer PB stuff, has chosen to commit the PB generated files rather than generate them via mvn. If I were to add a new field to the main header (the "hrpc" one) to indicate the format of the next layer would it satisfy your concerns? It would allow one to write all the layers in Writables, or PB or Avro.
        Hide
        Todd Lipcon added a comment -

        To be honest, all of this pluggability in serializations seems to be counter to the goal of interoperability here. If we just use protobufs everywhere, than anyone can come along in any reasonably common language and get a working implementation of Hadoop RPC on either the server or client side. If every single piece has a flag indicating whether it might be protobuf, avro, Writable, msgpack, JSON, etc, then anyone trying to implement the protocol will be faced with an insurmountable amount of work.

        Making the protocol entirely generalizable smacks of REST to me- people talk about REST as if you could have a "REST client", when in fact it's not general at all. Do we really want Hadoop RPC to evolve to the point where I can have a Hadoop RPC client that only understands Avro, and you can have a server that only understands Protobuf, and we can't talk to each other despite both "implementing Hadoop RPC"?

        Show
        Todd Lipcon added a comment - To be honest, all of this pluggability in serializations seems to be counter to the goal of interoperability here. If we just use protobufs everywhere, than anyone can come along in any reasonably common language and get a working implementation of Hadoop RPC on either the server or client side. If every single piece has a flag indicating whether it might be protobuf, avro, Writable, msgpack, JSON, etc, then anyone trying to implement the protocol will be faced with an insurmountable amount of work. Making the protocol entirely generalizable smacks of REST to me- people talk about REST as if you could have a "REST client", when in fact it's not general at all. Do we really want Hadoop RPC to evolve to the point where I can have a Hadoop RPC client that only understands Avro, and you can have a server that only understands Protobuf, and we can't talk to each other despite both "implementing Hadoop RPC"?
        Hide
        Sanjay Radia added a comment -

        I personally prefer to use PB for protocols - so i agree with you Todd.
        My main motivation for pluggability is for transitions across fundamental protocol changes. Once we go to PB I doubt if we will need change unless we find a flaw. Pluggability does provide an option for inter-operating with 20.

        I feel that adding a field to "hrpc" header mentioned in my previous comment allows Hadoop RPC library to be used by other projects with a different serialization; I understood this to be one of Doug's motivation. From a pure protocol layering point of view having a field for the protocol at the next layer does make sense.

        Show
        Sanjay Radia added a comment - I personally prefer to use PB for protocols - so i agree with you Todd. My main motivation for pluggability is for transitions across fundamental protocol changes. Once we go to PB I doubt if we will need change unless we find a flaw. Pluggability does provide an option for inter-operating with 20. I feel that adding a field to "hrpc" header mentioned in my previous comment allows Hadoop RPC library to be used by other projects with a different serialization; I understood this to be one of Doug's motivation. From a pure protocol layering point of view having a field for the protocol at the next layer does make sense.
        Hide
        Doug Cutting added a comment -

        Todd, I mostly agree with you.

        There are a few goals one might have for RPC in Hadoop:

        1. support compatibility across versions of Hadoop, permitting rolling upgrades, etc.
        2. permit RPC clients and servers in languages besides Java
        3. change RPC, Shuffle and HDFS to use common communications layer, simplifying the implementation and evolution of security, performance, etc.

        To my understanding Sanjay is only currently pursuing the first, and in particular wants to be able to provide compatibility between 0.20, 0.23 and 0.24 To do this, he'd like to permit both an old serialization (Writable) and a new serialization (Protobuf).

        I am interested in all three, and would propose that we not attempt to implement wire-compatibility with 0.20 and perhaps not even with 0.23, but rather select a single message serialization and client/server protocol that's programming language-independent and only promise wire-compatibility between releases once we've made a transition to those. I don't believe that anything is language independent until it has implementations in multiple languages. The more we add features to Hadoop's existing RPC framework the further we get from it supporting multiple languages and using a standard client/server protocol.

        My interest in this particular issue is simply to try to keep the communications layer something that we might someday reasonably describe as a network standard, i.e., comparable to SASL or HTTP.

        Show
        Doug Cutting added a comment - Todd, I mostly agree with you. There are a few goals one might have for RPC in Hadoop: support compatibility across versions of Hadoop, permitting rolling upgrades, etc. permit RPC clients and servers in languages besides Java change RPC, Shuffle and HDFS to use common communications layer, simplifying the implementation and evolution of security, performance, etc. To my understanding Sanjay is only currently pursuing the first, and in particular wants to be able to provide compatibility between 0.20, 0.23 and 0.24 To do this, he'd like to permit both an old serialization (Writable) and a new serialization (Protobuf). I am interested in all three, and would propose that we not attempt to implement wire-compatibility with 0.20 and perhaps not even with 0.23, but rather select a single message serialization and client/server protocol that's programming language-independent and only promise wire-compatibility between releases once we've made a transition to those. I don't believe that anything is language independent until it has implementations in multiple languages. The more we add features to Hadoop's existing RPC framework the further we get from it supporting multiple languages and using a standard client/server protocol. My interest in this particular issue is simply to try to keep the communications layer something that we might someday reasonably describe as a network standard, i.e., comparable to SASL or HTTP.
        Hide
        Doug Cutting added a comment -

        > As far as smaller patch – that is mostly due to auto generated files from PB.

        Even without that, if something can be done without adding a dependency and in less code then what real value does the dependency bring? We don't expect to be changing this header frequently. This is not a user extension point.

        > If I were to add a new field to the main header (the "hrpc" one) to indicate the format of the next layer would it satisfy your concerns?

        That would be better. Then all of the protobuf-specific stuff could be bundled into classes that already require protobuf. But if there's stuff like security that's used by all message encodings, why not use an encoding-independent format in the header? I don't see the value of supporting both a Writable and protobuf version of these headers.

        Show
        Doug Cutting added a comment - > As far as smaller patch – that is mostly due to auto generated files from PB. Even without that, if something can be done without adding a dependency and in less code then what real value does the dependency bring? We don't expect to be changing this header frequently. This is not a user extension point. > If I were to add a new field to the main header (the "hrpc" one) to indicate the format of the next layer would it satisfy your concerns? That would be better. Then all of the protobuf-specific stuff could be bundled into classes that already require protobuf. But if there's stuff like security that's used by all message encodings, why not use an encoding-independent format in the header? I don't see the value of supporting both a Writable and protobuf version of these headers.
        Hide
        Sanjay Radia added a comment -

        I am interested in 2 and 3. I haven't stated otherwise. Indeed support for non-java languages has been a very important motivation for a long time.

        Hadoop RPC started to become pluggbale as part of the Avro thrust. Before I started to work on this recently, Hadoop rpc already had mulitple RPC engines (Writable, Avro, PB). The extra field (protocolKind) I added was to distinguish between the multiple PRC Engines and catch bugs when there is a mismatch - this should have been done when we added the multiple RPC engines.

        We agree on the goal: "..select a single message serialization and client/server protocol that's programming language-independent ..".
        Given that the other layers are moving to PB or have already moved to PB (HDFS's data transfer protocol and MR2 protocols) why not let the IPC header be in PB?
        Or are you proposing that all the upper protocol layers use a similar technique to the one you used for the IpcHeader (Map<String,ByteBuffer>) and that we move away from PB and Avro for Hadoop protocols?

        Show
        Sanjay Radia added a comment - I am interested in 2 and 3. I haven't stated otherwise. Indeed support for non-java languages has been a very important motivation for a long time. Hadoop RPC started to become pluggbale as part of the Avro thrust. Before I started to work on this recently, Hadoop rpc already had mulitple RPC engines (Writable, Avro, PB). The extra field (protocolKind) I added was to distinguish between the multiple PRC Engines and catch bugs when there is a mismatch - this should have been done when we added the multiple RPC engines. We agree on the goal: "..select a single message serialization and client/server protocol that's programming language-independent ..". Given that the other layers are moving to PB or have already moved to PB (HDFS's data transfer protocol and MR2 protocols) why not let the IPC header be in PB? Or are you proposing that all the upper protocol layers use a similar technique to the one you used for the IpcHeader (Map<String,ByteBuffer>) and that we move away from PB and Avro for Hadoop protocols?
        Hide
        Sanjay Radia added a comment -

        > This is not a user extension point....
        > ... like security that's used by all message encodings, why not use an encoding-independent format in the header? ...

        Essentially you are arguing that IpcHeader does not need to be extensible and that essentially we should merge the "hrpc" header and the ipcHeader. I do believe it is missing the "protocolKind" field since there are 3 existing protocols (3 rpc engines) at the next layer. Hence if the header had been extensible in the past we would have been able to add the field easily in a backward compatible fashion.

        Show
        Sanjay Radia added a comment - > This is not a user extension point.... > ... like security that's used by all message encodings, why not use an encoding-independent format in the header? ... Essentially you are arguing that IpcHeader does not need to be extensible and that essentially we should merge the "hrpc" header and the ipcHeader. I do believe it is missing the "protocolKind" field since there are 3 existing protocols (3 rpc engines) at the next layer. Hence if the header had been extensible in the past we would have been able to add the field easily in a backward compatible fashion.
        Hide
        Todd Lipcon added a comment -

        Even without that, if something can be done without adding a dependency and in less code then what real value does the dependency bring?

        In this case, using a dependency on a format that already has efficient implementations in many languages makes it easier for a new language to implement HRPC. It's the difference between custom code to parse your generic format (tag-value pairs), vs just running "protoc -gen <mylanguage>" and calling "IpcHeader.parseFrom(stream)".

        This is coming from the perspective of having implemented Hadoop RPC servers and clients in both Erlang and Python. In both cases it would have been easier to describe the protocol and headers in terms of varint-prefixed protobufs than in custom serializations.

        I don't mean to block progress - if you two can agree on a patch, I'm fine with it. Just giving my opinion that all of this extra layering and switchability only makes it harder for someone to implement the protocol, not easier.

        Show
        Todd Lipcon added a comment - Even without that, if something can be done without adding a dependency and in less code then what real value does the dependency bring? In this case, using a dependency on a format that already has efficient implementations in many languages makes it easier for a new language to implement HRPC. It's the difference between custom code to parse your generic format (tag-value pairs), vs just running "protoc -gen <mylanguage>" and calling "IpcHeader.parseFrom(stream)". This is coming from the perspective of having implemented Hadoop RPC servers and clients in both Erlang and Python. In both cases it would have been easier to describe the protocol and headers in terms of varint-prefixed protobufs than in custom serializations. I don't mean to block progress - if you two can agree on a patch, I'm fine with it. Just giving my opinion that all of this extra layering and switchability only makes it harder for someone to implement the protocol, not easier.
        Hide
        Doug Cutting added a comment -

        > The extra field (protocolKind) I added was to distinguish between the multiple PRC Engines and catch bugs when there is a mismatch - this should have been done when we added the multiple RPC engines.

        I disagree. We need to have two engines: present and future. A developer should be able to configure things to use future, but only present should ever be used in production. Our wire format should not be engineered around supporting multiple wire formats in production. Rather a given release should support a single wire format. We'll someday switch to future, which supports inter-version and inter-language compatibility, security, has good performance, etc. At that point we discard present.

        > Essentially you are arguing that IpcHeader does not need to be extensible and that essentially we should merge the "hrpc" header and the ipcHeader.

        No, the wire format of headers is a property of the protocol engine, not a universal thing. The existing AvroRpcEngine is a work in progress. It piggybacks on WritableRpc for its wire protocol as its initial goal was just to debug just message serialization, then subsequently change the full protocol wire format, including headers, security, etc. Work on AvroRpcEngine stopped when e14 vetoed this proposed present/future incremental approach. If that work were to continue, I might next try to switch AvroRpcEngine to a different underlying network protocol (e.g., HTTP, Avro's SASL profile, or something new) that's language independent, has other implementations, etc. Or work could be done on switching from Avro reflect to specific. But we shouldn't extend the legacy RPC format to also handle the future format, otherwise we're stuck with the legacy forever.

        Show
        Doug Cutting added a comment - > The extra field (protocolKind) I added was to distinguish between the multiple PRC Engines and catch bugs when there is a mismatch - this should have been done when we added the multiple RPC engines. I disagree. We need to have two engines: present and future. A developer should be able to configure things to use future, but only present should ever be used in production. Our wire format should not be engineered around supporting multiple wire formats in production. Rather a given release should support a single wire format. We'll someday switch to future, which supports inter-version and inter-language compatibility, security, has good performance, etc. At that point we discard present. > Essentially you are arguing that IpcHeader does not need to be extensible and that essentially we should merge the "hrpc" header and the ipcHeader. No, the wire format of headers is a property of the protocol engine, not a universal thing. The existing AvroRpcEngine is a work in progress. It piggybacks on WritableRpc for its wire protocol as its initial goal was just to debug just message serialization, then subsequently change the full protocol wire format, including headers, security, etc. Work on AvroRpcEngine stopped when e14 vetoed this proposed present/future incremental approach. If that work were to continue, I might next try to switch AvroRpcEngine to a different underlying network protocol (e.g., HTTP, Avro's SASL profile, or something new) that's language independent, has other implementations, etc. Or work could be done on switching from Avro reflect to specific. But we shouldn't extend the legacy RPC format to also handle the future format, otherwise we're stuck with the legacy forever.
        Hide
        Doug Cutting added a comment -

        Todd, I agree that we should switch to an RPC system that uses but a single serialization system. This header format could go into a new ProtobufRpcEngine, which determines headers, handshakes, message formats, etc. If Sanjay comes up with a fully-functional implementation of ProtobufRpcEngine before anyone else (like me) comes up with a fully-functional AvroRpcEngine then we should switch to that. It's probably silly to work on both.

        But if one were to design a client/server container to support messages serialized with different serialization systems (as, e.g., HTTP is) then I think it would be an abstraction violation to use a particular message serialization system to define its headers. But you're right that we probably don't need to design such a meta-container in the first place. Thrift already defines various containers, as does Avro. Protobuf does not, so if protobuf is selected for message serialization then we'll need to design a protobuf-specific client/server container protocol, implement it in different languages, etc. But it need not also support Writable or Avro.

        Show
        Doug Cutting added a comment - Todd, I agree that we should switch to an RPC system that uses but a single serialization system. This header format could go into a new ProtobufRpcEngine, which determines headers, handshakes, message formats, etc. If Sanjay comes up with a fully-functional implementation of ProtobufRpcEngine before anyone else (like me) comes up with a fully-functional AvroRpcEngine then we should switch to that. It's probably silly to work on both. But if one were to design a client/server container to support messages serialized with different serialization systems (as, e.g., HTTP is) then I think it would be an abstraction violation to use a particular message serialization system to define its headers. But you're right that we probably don't need to design such a meta-container in the first place. Thrift already defines various containers, as does Avro. Protobuf does not, so if protobuf is selected for message serialization then we'll need to design a protobuf-specific client/server container protocol, implement it in different languages, etc. But it need not also support Writable or Avro.
        Hide
        Sanjay Radia added a comment -

        Some background of the current Wire protocol:
        ConnectionHeader IpcConnectionHeader RpcPayload RpcPayload RpcPayload...

        The RpcPayload can be one of several kinds - basic IPC calls, writableRPC, pbRPC etc.
        Note the basic IPC calls are the original Hadoop IPC calls - there are several tests that use these.
        All the Payloads have to be of the same type.
        If there is mismatch of the RPCPayload the one will usually get a de-serializaiton error.

        This jira proposes to make the IpcConnectionHeader extensible.

        I added the protocolKind to distinguish between these.
        Perhaps I should have called the field RPCEngine type instead.
        It catches the unpleasant de-serialization error and reports a nicer error message.
        Most protocols that have multiple instances of a layer add such "kind" field.
        I believe it is wise to have this field until be remove all but one and decide that there will never be another.
        (A fine decision to make.)

        Show
        Sanjay Radia added a comment - Some background of the current Wire protocol: ConnectionHeader IpcConnectionHeader RpcPayload RpcPayload RpcPayload... The RpcPayload can be one of several kinds - basic IPC calls, writableRPC, pbRPC etc. Note the basic IPC calls are the original Hadoop IPC calls - there are several tests that use these. All the Payloads have to be of the same type. If there is mismatch of the RPCPayload the one will usually get a de-serializaiton error. This jira proposes to make the IpcConnectionHeader extensible. I added the protocolKind to distinguish between these. Perhaps I should have called the field RPCEngine type instead. It catches the unpleasant de-serialization error and reports a nicer error message. Most protocols that have multiple instances of a layer add such "kind" field. I believe it is wise to have this field until be remove all but one and decide that there will never be another. (A fine decision to make.)
        Hide
        Doug Cutting added a comment -

        > I added the protocolKind to distinguish between these.

        This seems analagous to HTTP's Content-Type and may be useful if we expect to support multiple kinds of RPC payloads from a server. Do we expect that long-term? If we do only short-term we might handle different kinds of payloads with the ConnectionHeader version.

        Also, why are ConnectionHeader and IpcConnectionHeader separate? It makes sense to separate them if IpcConnectionHeader is specific to the protocolKind, but otherwise, if both headers are universal, perhaps they could be combined. If they're universal then they shouldn't depend on Writable, Protocol Buffers or Avro, but rather should be explicitly and fully specified in the protocol.

        Show
        Doug Cutting added a comment - > I added the protocolKind to distinguish between these. This seems analagous to HTTP's Content-Type and may be useful if we expect to support multiple kinds of RPC payloads from a server. Do we expect that long-term? If we do only short-term we might handle different kinds of payloads with the ConnectionHeader version. Also, why are ConnectionHeader and IpcConnectionHeader separate? It makes sense to separate them if IpcConnectionHeader is specific to the protocolKind, but otherwise, if both headers are universal, perhaps they could be combined. If they're universal then they shouldn't depend on Writable, Protocol Buffers or Avro, but rather should be explicitly and fully specified in the protocol.
        Hide
        Sanjay Radia added a comment -

        Currently it does support multiple payload types - unfortunately a protocolKind field was not added when multiple payload types were added. I just added it to be consistent with the the current implementation. We can remove it if we move to a single RPC payload type.

        The code seems to make a difference between the IPC layer and the RPC layer. There are "parallel ipc calls" that exist in the test code. My impression is that one can build different kinds of RPC or other communications on top of IPC.
        The layers and the headers are not very clean but it appears that the original purpose was to be flexible. I think you made some of design more flexible when you added Avro support alongside the existing writable stuff.

        The separation between connection header and ipc connection header has also puzzled me - I don't know the historical reasons. It is wise leave the initial connection header intact so that a connection from an old client is nicely rejected.

        Show
        Sanjay Radia added a comment - Currently it does support multiple payload types - unfortunately a protocolKind field was not added when multiple payload types were added. I just added it to be consistent with the the current implementation. We can remove it if we move to a single RPC payload type. The code seems to make a difference between the IPC layer and the RPC layer. There are "parallel ipc calls" that exist in the test code. My impression is that one can build different kinds of RPC or other communications on top of IPC. The layers and the headers are not very clean but it appears that the original purpose was to be flexible. I think you made some of design more flexible when you added Avro support alongside the existing writable stuff. The separation between connection header and ipc connection header has also puzzled me - I don't know the historical reasons. It is wise leave the initial connection header intact so that a connection from an old client is nicely rejected.
        Hide
        Doug Cutting added a comment -

        > Currently it does support multiple payload types

        Are you referring to Avro? If so, that's not quite right. Avro requests and responses are wrapped in a Writable. The wire format of all requests and responses is currently Writable. AvroRpcEngine layers on top of the WritableRpcEngine, embedding Avro in the Writable RPC format on the wire.

        Show
        Doug Cutting added a comment - > Currently it does support multiple payload types Are you referring to Avro? If so, that's not quite right. Avro requests and responses are wrapped in a Writable. The wire format of all requests and responses is currently Writable. AvroRpcEngine layers on top of the WritableRpcEngine, embedding Avro in the Writable RPC format on the wire.
        Hide
        Sanjay Radia added a comment -

        I meant the PB Writable one - its in the MR project and needs to be moved to common.

        Show
        Sanjay Radia added a comment - I meant the PB Writable one - its in the MR project and needs to be moved to common.
        Hide
        Sanjay Radia added a comment -

        Update on the current Wire protocol:
        ConnectionHeader IpcConnectionHeader [IpcCallHeader,RpcPayload] [IpcCallHeader,RpcPayload] ...
        Currently IpcCallHeader is simply the CallId (an int); and it is not even defined as a header but is simply serialized before the RpcPayload is serialized:

        // d is the data output buffer and param is rpcPayload.
        d.writeInt(call.id);
        call.param.write(d);
        

        I suggest that we make the ipcCallHeader a real header.

        Show
        Sanjay Radia added a comment - Update on the current Wire protocol: ConnectionHeader IpcConnectionHeader [IpcCallHeader,RpcPayload] [IpcCallHeader,RpcPayload] ... Currently IpcCallHeader is simply the CallId (an int); and it is not even defined as a header but is simply serialized before the RpcPayload is serialized: // d is the data output buffer and param is rpcPayload. d.writeInt(call.id); call.param.write(d); I suggest that we make the ipcCallHeader a real header.
        Hide
        Doug Cutting added a comment -

        Is this wire protocol meant to be a serialization-independent container? If so, then I'd argue that the format of its headers should be simple and fully-specified in the container specification and not rely on a particular serialization engine. The payloads will be specific to serialization engines and their serialization engine will be named in headers. Does that make sense?

        Show
        Doug Cutting added a comment - Is this wire protocol meant to be a serialization-independent container? If so, then I'd argue that the format of its headers should be simple and fully-specified in the container specification and not rely on a particular serialization engine. The payloads will be specific to serialization engines and their serialization engine will be named in headers. Does that make sense?
        Hide
        Benoit Sigoure added a comment -

        If I can just throw in my 2¢: I implemented an HBase client entirely from scratch, so I had to figure out the Hadoop RPC protocol and its HBase variant – which BTW I documented here.

        I completely agree with what Todd said. I wouldn't have said it better myself, so let me quote him:

        To be honest, all of this pluggability in serializations seems to be counter to the goal of interoperability here. If we just use protobufs everywhere, than anyone can come along in any reasonably common language and get a working implementation of Hadoop RPC on either the server or client side. If every single piece has a flag indicating whether it might be protobuf, avro, Writable, msgpack, JSON, etc, then anyone trying to implement the protocol will be faced with an insurmountable amount of work.

        The current Hadoop and HBase RPC protocols are unnecessarily hard to implement. Using an unified serialization mechanism such as PB for everything is a great step towards making clients significantly simpler.

        Simpler is better. KISS.

        Show
        Benoit Sigoure added a comment - If I can just throw in my 2¢: I implemented an HBase client entirely from scratch, so I had to figure out the Hadoop RPC protocol and its HBase variant – which BTW I documented here . I completely agree with what Todd said. I wouldn't have said it better myself, so let me quote him: To be honest, all of this pluggability in serializations seems to be counter to the goal of interoperability here. If we just use protobufs everywhere, than anyone can come along in any reasonably common language and get a working implementation of Hadoop RPC on either the server or client side. If every single piece has a flag indicating whether it might be protobuf, avro, Writable, msgpack, JSON, etc, then anyone trying to implement the protocol will be faced with an insurmountable amount of work. The current Hadoop and HBase RPC protocols are unnecessarily hard to implement. Using an unified serialization mechanism such as PB for everything is a great step towards making clients significantly simpler. Simpler is better. KISS.
        Hide
        Sanjay Radia added a comment -

        Doug, given that Hadoop has moved to ProtoccolBuffers over the last month, are you still opposed to the headers being in PB?
        I do see your point about it being nicer if the headers were serialization neutral but am not sure if it makes much of difference in practice.

        Show
        Sanjay Radia added a comment - Doug, given that Hadoop has moved to ProtoccolBuffers over the last month, are you still opposed to the headers being in PB? I do see your point about it being nicer if the headers were serialization neutral but am not sure if it makes much of difference in practice.
        Hide
        Doug Cutting added a comment -

        I still see WritableRpcEngine in the source tree and used as the default implementation. As long as there's the RpcEngine abstraction then there's support for different RPC serializations. So if you're intent on hardwiring Protobuf, I'd suggest removing RpcEngine.

        Show
        Doug Cutting added a comment - I still see WritableRpcEngine in the source tree and used as the default implementation. As long as there's the RpcEngine abstraction then there's support for different RPC serializations. So if you're intent on hardwiring Protobuf, I'd suggest removing RpcEngine.
        Hide
        Suresh Srinivas added a comment -

        I initially liked the key, value idea that Doug proposed. However, after thinking a bit, such a mechanism would require following functionality:
        1. Ability to add optional keys that gets ignored on servers/clients where it not understood.
        2. Ability to add mandatory keys.

        Doing this code where mandatory keys are enforced and optional keys are ignore is not easy. Also capturing what key is optional and defining all this as an interface definition (instead of free key, value text) is non trivial. These are all the capabilities provided protobuf. Protobuf is already required part of MRV2 and HDFS. Instead of building all that functionality, my vote is to use protobuf.

        Show
        Suresh Srinivas added a comment - I initially liked the key, value idea that Doug proposed. However, after thinking a bit, such a mechanism would require following functionality: 1. Ability to add optional keys that gets ignored on servers/clients where it not understood. 2. Ability to add mandatory keys. Doing this code where mandatory keys are enforced and optional keys are ignore is not easy. Also capturing what key is optional and defining all this as an interface definition (instead of free key, value text) is non trivial. These are all the capabilities provided protobuf. Protobuf is already required part of MRV2 and HDFS. Instead of building all that functionality, my vote is to use protobuf.
        Hide
        Sanjay Radia added a comment -

        For a very long time our header have been writable and we have been supported rpc engine - clearly the headers are not neutral to all serializations. Anyone wanting to use PB has to be prepared to write the headers in writable. Hence for a very long time
        our implementation favored writables. If the notion of RpcEngine made sense in current trunk why does it not make sense with PB.

        We are now proposing making the headers extensible using PB.
        You have proposed another solution of k-value pairs - this also makes the headers extensible. We will have to add notions of
        which keys are required and which are optional to make this work well and will be reinventing some of the PB technology.
        The machinery you are proposing is not all that simple once you fill in all the necessary details.

        I am arguing that it is perfectly okay for the headers to pick one form of serialization that is extensible.
        We can supply a small c-or-java library to generate the PB headers for others to use.

        Show
        Sanjay Radia added a comment - For a very long time our header have been writable and we have been supported rpc engine - clearly the headers are not neutral to all serializations. Anyone wanting to use PB has to be prepared to write the headers in writable. Hence for a very long time our implementation favored writables. If the notion of RpcEngine made sense in current trunk why does it not make sense with PB. We are now proposing making the headers extensible using PB. You have proposed another solution of k-value pairs - this also makes the headers extensible. We will have to add notions of which keys are required and which are optional to make this work well and will be reinventing some of the PB technology. The machinery you are proposing is not all that simple once you fill in all the necessary details. I am arguing that it is perfectly okay for the headers to pick one form of serialization that is extensible. We can supply a small c-or-java library to generate the PB headers for others to use.
        Hide
        Sanjay Radia added a comment -

        @Sanjay>> If I were to add a new field to the main header (the "hrpc" one) to indicate the format of the next layer would it satisfy your concerns?
        @Doug > That would be better. .. But if there's stuff like security that's used by all message encodings, why not use an encoding-independent format in the header?

        Would adding this field be a reasonable compromise?

        Show
        Sanjay Radia added a comment - @Sanjay>> If I were to add a new field to the main header (the "hrpc" one) to indicate the format of the next layer would it satisfy your concerns? @Doug > That would be better. .. But if there's stuff like security that's used by all message encodings, why not use an encoding-independent format in the header? Would adding this field be a reasonable compromise?
        Hide
        Doug Cutting added a comment -

        Suresh> Doing this code where mandatory keys are enforced and optional keys are ignore is not easy.

        Why is that hard? The implementation either requires the keys (mandatory) or first checks whether they're specified (optional).

        I provided a patch that's equivalent in functionality to Sanjay's protobuf patch yet is about half the size, disregarding the protobuf generated code in his patch. I don't think adding some code to make some items optional would not make it much bigger or complicated.

        Do you have real examples of RPC headers that would make this unmanageable using the approach I presented in that patch?

        Sanjay> Would adding this field be a reasonable compromise?

        Seems like added complexity without much added value.

        Why not just get rid of Writable and RpcEngine and hardwire protobuf from the top down? That would be easier to support and would be easier for folks to implement in other languages. Why have an implementation that pretends to be serialization independent? What advantages does that provide?

        Show
        Doug Cutting added a comment - Suresh> Doing this code where mandatory keys are enforced and optional keys are ignore is not easy. Why is that hard? The implementation either requires the keys (mandatory) or first checks whether they're specified (optional). I provided a patch that's equivalent in functionality to Sanjay's protobuf patch yet is about half the size, disregarding the protobuf generated code in his patch. I don't think adding some code to make some items optional would not make it much bigger or complicated. Do you have real examples of RPC headers that would make this unmanageable using the approach I presented in that patch? Sanjay> Would adding this field be a reasonable compromise? Seems like added complexity without much added value. Why not just get rid of Writable and RpcEngine and hardwire protobuf from the top down? That would be easier to support and would be easier for folks to implement in other languages. Why have an implementation that pretends to be serialization independent? What advantages does that provide?
        Hide
        Devaraj Das added a comment -

        Why not just get rid of Writable and RpcEngine and hardwire protobuf from the top down? That would be easier to support and would be easier for folks to implement in other languages. Why have an implementation that pretends to be serialization independent? What advantages does that provide?

        +1 on this. Let's phase out writable over time. It's good in theory to have a pluggable RPC engine but in practice it seems unlikely we'd have yet another impl.

        Show
        Devaraj Das added a comment - Why not just get rid of Writable and RpcEngine and hardwire protobuf from the top down? That would be easier to support and would be easier for folks to implement in other languages. Why have an implementation that pretends to be serialization independent? What advantages does that provide? +1 on this. Let's phase out writable over time. It's good in theory to have a pluggable RPC engine but in practice it seems unlikely we'd have yet another impl.
        Hide
        Eli Collins added a comment -

        +1 on this as well. (RpcEngine is a small interface so I could go either way there, once there's just a PB RpcEngine we can fold it into ProtobufRpcEngine and cleanup things that no longer need to be generic).

        Show
        Eli Collins added a comment - +1 on this as well. (RpcEngine is a small interface so I could go either way there, once there's just a PB RpcEngine we can fold it into ProtobufRpcEngine and cleanup things that no longer need to be generic).
        Hide
        Sanjay Radia added a comment -
        • There is clean layering between IPC layer and RPC layer - we should retain that.
        • If the rpc engine wasn't there i would not add it. But it is there and it is now clean. I am glad that doug added that.
          • It can help us provide 20 compatibility - 20 compatibility is one of the things that prevents FB from moving to trunk. I personally don't want to do the work for 20 compatibility but if FB can do it for their 20 and it helps them move to trunk this is really goodness for the community. (Dhruba and i have discussed this issue).
          • It can also help us support compatibility with HBase rpc in the future and both projects can consider sharing RPC layer.
          • If there is any future discontinuity in our RPC, the RPC Engine can help us deal with the discontinuity.
        • Another project can use Hadoop RPC but a different serialization then it can. (Say avro or thrift).
        • By adding the field I have suggested, Doug can add his k-v pair as a parallel structure if he wishes. Indeed that field should be there anyway from good protocol/layering principals. K-V proposal is clearly another form of serialization and in that sense not really neutral; but as Doug argues it does have the advantage over PB that it does not require a tool to generate it.
        • I am okay with making PB the default engine and am neutral about removing writable engine.
        Show
        Sanjay Radia added a comment - There is clean layering between IPC layer and RPC layer - we should retain that. If the rpc engine wasn't there i would not add it. But it is there and it is now clean. I am glad that doug added that. It can help us provide 20 compatibility - 20 compatibility is one of the things that prevents FB from moving to trunk. I personally don't want to do the work for 20 compatibility but if FB can do it for their 20 and it helps them move to trunk this is really goodness for the community. (Dhruba and i have discussed this issue). It can also help us support compatibility with HBase rpc in the future and both projects can consider sharing RPC layer. If there is any future discontinuity in our RPC, the RPC Engine can help us deal with the discontinuity. Another project can use Hadoop RPC but a different serialization then it can. (Say avro or thrift). By adding the field I have suggested, Doug can add his k-v pair as a parallel structure if he wishes. Indeed that field should be there anyway from good protocol/layering principals. K-V proposal is clearly another form of serialization and in that sense not really neutral; but as Doug argues it does have the advantage over PB that it does not require a tool to generate it. I am okay with making PB the default engine and am neutral about removing writable engine.
        Hide
        Suresh Srinivas added a comment -

        I provided a patch that's equivalent in functionality to Sanjay's protobuf patch yet is about half the size, disregarding the protobuf generated code in his patch.

        I feel the intent of your patch was to show how this can be done in key values and hence patch may not be in final shape. I believe the patch is missing several things.

        1. The header, keys and values are not clearly documented.
        2. No tests are added.
        3. The fact that you are creating ugi in for loop, might not be correct. It may have to be outside the for loop.
        4. Other than Key.EOH every thing is optional (rightfully so in some cases). However, not sure if Key.PROTOCOL could be optional. So checks are needed to ensure mandatory key PROTOCOL is sent.

        BTW to me more than the size, it is maintainability that matters.

        I don't think adding some code to make some items optional would not make it much bigger or complicated.

        As I said earlier, in your patch, it seems to me that all are optional except EOH. See the following comment:

        Do you have real examples of RPC headers that would make this unmanageable using the approach I presented in that patch?

        Right now there are 3 key value pairs. Writing the code to ensure mandatory and optional for just this three would expand the code further. Some requirements I have that are satisfied nicely by protobuf is:

        1. Documenting header - Hadoop documentation has been poor and incomplete. The fact that any one who wants to connect over RPC needs to consult the document, look at the implementation is painful. Using things such as ordinal to capture key value, having to look at javacode for protocol string (such AuthMethod.DIGEST etc.) also makes it hard.
        2. protobuf gives other capabilities that are also of interest, such as default values for optional field, having capability not just sent only String in value but an object etc.
        Show
        Suresh Srinivas added a comment - I provided a patch that's equivalent in functionality to Sanjay's protobuf patch yet is about half the size, disregarding the protobuf generated code in his patch. I feel the intent of your patch was to show how this can be done in key values and hence patch may not be in final shape. I believe the patch is missing several things. The header, keys and values are not clearly documented. No tests are added. The fact that you are creating ugi in for loop, might not be correct. It may have to be outside the for loop. Other than Key.EOH every thing is optional (rightfully so in some cases). However, not sure if Key.PROTOCOL could be optional. So checks are needed to ensure mandatory key PROTOCOL is sent. BTW to me more than the size, it is maintainability that matters. I don't think adding some code to make some items optional would not make it much bigger or complicated. As I said earlier, in your patch, it seems to me that all are optional except EOH. See the following comment: Do you have real examples of RPC headers that would make this unmanageable using the approach I presented in that patch? Right now there are 3 key value pairs. Writing the code to ensure mandatory and optional for just this three would expand the code further. Some requirements I have that are satisfied nicely by protobuf is: Documenting header - Hadoop documentation has been poor and incomplete. The fact that any one who wants to connect over RPC needs to consult the document, look at the implementation is painful. Using things such as ordinal to capture key value, having to look at javacode for protocol string (such AuthMethod.DIGEST etc.) also makes it hard. protobuf gives other capabilities that are also of interest, such as default values for optional field, having capability not just sent only String in value but an object etc.
        Hide
        Doug Cutting added a comment -

        Sanjay> [RpcEngine] can help us provide 20 compatibility

        I'm confused. If you switch the RPC headers to protobuf how can you provide 20 compatibility? If the protobuf headers are confined to ProtobufRpcEngine then I have no issues. I thought the proposal here is to change RPC format so that all RpcEngine implementations use protobuf headers. That would break 0.20 compatibility, no?

        Sanjay> Another project can use Hadoop RPC but a different serialization then it can. (Say avro or thrift).

        If we don't have multiple implementations of an extension API that are regularly used then the chances that that it's actually possible to create other implementations that work is small. The reason we removed Avro was that implementing RpcEngine alone was no longer sufficient: there were assumptions that Protobuf was used in other places in the code. RpcEngine is currently a broken abstraction, providing a false claim of plugability.

        Sanjay> Doug can add his k-v pair as a parallel structure if he wishes.

        I have no desire to add a k-v parallel structure. I'm just trying to help keep the RPC architecture simple and sane.

        Suresh> it seems to me that all are optional except EOH

        The protobuf documentation says, "Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal." So (arguably) all or most fields should be optional.

        Suresh> Documenting header - Hadoop documentation has been poor and incomplete.

        Indeed. If Hadoop RPC is to be interoperable then it needs a language-independent specification and multiple implementations. That specification might specify its headers either as some protobuf messages or as some RFC-like text. Either would work fine. The default value for optional header fields could be specified in either. But what's also required is a description of how to handle the various values for these header fields, including the default. Until then, the implementation is the specification.

        Show
        Doug Cutting added a comment - Sanjay> [RpcEngine] can help us provide 20 compatibility I'm confused. If you switch the RPC headers to protobuf how can you provide 20 compatibility? If the protobuf headers are confined to ProtobufRpcEngine then I have no issues. I thought the proposal here is to change RPC format so that all RpcEngine implementations use protobuf headers. That would break 0.20 compatibility, no? Sanjay> Another project can use Hadoop RPC but a different serialization then it can. (Say avro or thrift). If we don't have multiple implementations of an extension API that are regularly used then the chances that that it's actually possible to create other implementations that work is small. The reason we removed Avro was that implementing RpcEngine alone was no longer sufficient: there were assumptions that Protobuf was used in other places in the code. RpcEngine is currently a broken abstraction, providing a false claim of plugability. Sanjay> Doug can add his k-v pair as a parallel structure if he wishes. I have no desire to add a k-v parallel structure. I'm just trying to help keep the RPC architecture simple and sane. Suresh> it seems to me that all are optional except EOH The protobuf documentation says, "Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal." So (arguably) all or most fields should be optional. Suresh> Documenting header - Hadoop documentation has been poor and incomplete. Indeed. If Hadoop RPC is to be interoperable then it needs a language-independent specification and multiple implementations. That specification might specify its headers either as some protobuf messages or as some RFC-like text. Either would work fine. The default value for optional header fields could be specified in either. But what's also required is a description of how to handle the various values for these header fields, including the default. Until then, the implementation is the specification.
        Hide
        Suresh Srinivas added a comment -

        Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal." So (arguably) all or most fields should be optional.

        That is * some * engineers. In our protobuf we use both required and optional.

        Indeed. If Hadoop RPC is to be interoperable then it needs a language-independent specification and multiple implementations. That specification might specify its headers either as some protobuf messages or as some RFC-like text.

        Protobuf is my vote.

        But what's also required is a description of how to handle the various values for these header fields, including the default. Until then, the implementation is the specification.

        This documentation can be made available in .proto files (as it done on existing .proto files)

        Show
        Suresh Srinivas added a comment - Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal." So (arguably) all or most fields should be optional. That is * some * engineers. In our protobuf we use both required and optional. Indeed. If Hadoop RPC is to be interoperable then it needs a language-independent specification and multiple implementations. That specification might specify its headers either as some protobuf messages or as some RFC-like text. Protobuf is my vote. But what's also required is a description of how to handle the various values for these header fields, including the default. Until then, the implementation is the specification. This documentation can be made available in .proto files (as it done on existing .proto files)
        Hide
        Sanjay Radia added a comment -

        >Sanjay> ... can help us provide 20 compatibility
        >I'm confused. If you switch the RPC headers to protobuf how can you provide 20 compatibility?
        Not PB but the RPC engine can allow older impls to live side by side.

        >The reason we removed Avro was that implementing RpcEngine alone was no longer sufficient: there were assumptions that Protobuf was used in other places in the code. RpcEngine is currently a broken abstraction, providing a false claim of plugability.

        Doug your model was that setting a single flag would allow one to switch from one serialization to another.
        The model that was implemented allowed one use one or another engine but one had to have code to do that.

        Show
        Sanjay Radia added a comment - >Sanjay> ... can help us provide 20 compatibility >I'm confused. If you switch the RPC headers to protobuf how can you provide 20 compatibility? Not PB but the RPC engine can allow older impls to live side by side. >The reason we removed Avro was that implementing RpcEngine alone was no longer sufficient: there were assumptions that Protobuf was used in other places in the code. RpcEngine is currently a broken abstraction, providing a false claim of plugability. Doug your model was that setting a single flag would allow one to switch from one serialization to another. The model that was implemented allowed one use one or another engine but one had to have code to do that.
        Hide
        Suresh Srinivas added a comment -

        The reason we removed Avro was that implementing RpcEngine alone was no longer sufficient: there were assumptions that Protobuf was used in other places in the code. RpcEngine is currently a broken abstraction, providing a false claim of plugability.

        This is not correct.

        I think quite a bit of discussion has already happened around this. See the comment: https://issues.apache.org/jira/browse/HDFS-2664?focusedCommentId=13168625&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13168625

        BTW look at HDFS-1623 branch. There protocol implementations of both Writable and protobuf co-exist in the same NN server. Look at TestMultipoleProtocolServer test. Multiple versions of the same protocol exist.

        Show
        Suresh Srinivas added a comment - The reason we removed Avro was that implementing RpcEngine alone was no longer sufficient: there were assumptions that Protobuf was used in other places in the code. RpcEngine is currently a broken abstraction, providing a false claim of plugability. This is not correct. I think quite a bit of discussion has already happened around this. See the comment: https://issues.apache.org/jira/browse/HDFS-2664?focusedCommentId=13168625&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13168625 BTW look at HDFS-1623 branch. There protocol implementations of both Writable and protobuf co-exist in the same NN server. Look at TestMultipoleProtocolServer test. Multiple versions of the same protocol exist.
        Hide
        Doug Cutting added a comment -

        > Not PB but the RPC engine can allow older impls to live side by side.

        I'm not sure what you're saying. Would an unchanged 0.20 client still be able to talk to a server with PB headers? Or would we backport the PB header stuff to 0.20? Or something else?

        Show
        Doug Cutting added a comment - > Not PB but the RPC engine can allow older impls to live side by side. I'm not sure what you're saying. Would an unchanged 0.20 client still be able to talk to a server with PB headers? Or would we backport the PB header stuff to 0.20? Or something else?
        Hide
        Sanjay Radia added a comment -

        @Doug> I'm not sure what you're saying. Would an unchanged 0.20 client...
        Unchanged 20 client. Based on the version number in the boot header we can call into an rpc engine that supports the 20 format. ie we would have to write a 20-rpc engine.

        Show
        Sanjay Radia added a comment - @Doug> I'm not sure what you're saying. Would an unchanged 0.20 client... Unchanged 20 client. Based on the version number in the boot header we can call into an rpc engine that supports the 20 format. ie we would have to write a 20-rpc engine.
        Hide
        Doug Cutting added a comment -

        > Based on the version number in the boot header we can call into an rpc engine that supports the 20 format

        So the single motivating use case for the RPCEngine abstraction would not read the RPCEngine implementation-independent headers. That doesn't seem like a good abstraction.

        But we're not making any progress here. My suggestions are not affecting the design. So I'll stop trying to help, step away, and let you get on with it.

        I've stopped watching this issue. Good luck!

        Show
        Doug Cutting added a comment - > Based on the version number in the boot header we can call into an rpc engine that supports the 20 format So the single motivating use case for the RPCEngine abstraction would not read the RPCEngine implementation-independent headers. That doesn't seem like a good abstraction. But we're not making any progress here. My suggestions are not affecting the design. So I'll stop trying to help, step away, and let you get on with it. I've stopped watching this issue. Good luck!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515289/ipcHeader3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.fs.viewfs.TestViewFsTrash

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/617//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/617//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/617//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515289/ipcHeader3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/617//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/617//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/617//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515653/ipcHeader4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.fs.viewfs.TestViewFsTrash

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/618//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/618//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515653/ipcHeader4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/618//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/618//console This message is automatically generated.
        Hide
        Sanjay Radia added a comment -
        • Existing tests are sufficient
        • TestViewFsTrash runs fine on my desktop - ran it several times
          Running org.apache.hadoop.fs.viewfs.TestViewFsTrash
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.879 sec
          Results :
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
        Show
        Sanjay Radia added a comment - Existing tests are sufficient TestViewFsTrash runs fine on my desktop - ran it several times Running org.apache.hadoop.fs.viewfs.TestViewFsTrash Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.879 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515821/ipcHeader5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.fs.viewfs.TestViewFsTrash

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/624//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/624//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515821/ipcHeader5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/624//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/624//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I believe the failure of TestViewFsTrash is not related to the patch. See HADOOP-8110.

        Show
        Tsz Wo Nicholas Sze added a comment - I believe the failure of TestViewFsTrash is not related to the patch. See HADOOP-8110 .
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516121/ipcHeader7.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/629//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516121/ipcHeader7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/629//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516135/ipcHeader8.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.fs.viewfs.TestViewFsTrash

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/630//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/630//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516135/ipcHeader8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/630//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/630//console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        Comments:

        1. Why use Server.IpcSerializationType in Client.java? There are other references to Server as well in the client. This can be cleaned up in a separate jira.
        2. connectionHeader is always null in Server#readAndProcess()
        3. Some indentation issues in IpcConnectionContext.proto
        4. Server#respondUnsupportedSerialization is sending VersionMismatch.class as error class.

        +1 with comments addressed.

        Show
        Suresh Srinivas added a comment - Comments: Why use Server.IpcSerializationType in Client.java? There are other references to Server as well in the client. This can be cleaned up in a separate jira. connectionHeader is always null in Server#readAndProcess() Some indentation issues in IpcConnectionContext.proto Server#respondUnsupportedSerialization is sending VersionMismatch.class as error class. +1 with comments addressed.
        Hide
        Sanjay Radia added a comment -

        Addressed the feedback.

        Show
        Sanjay Radia added a comment - Addressed the feedback.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516580/ipcHeader9.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed the unit tests build

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516580/ipcHeader9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed the unit tests build +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516586/ipcHeader10.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/645//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/645//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516586/ipcHeader10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/645//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/645//console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1813 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1813/)
        HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261)

        Result = SUCCESS
        sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1813 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1813/ ) HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261) Result = SUCCESS sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #1888 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1888/)
        HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261)

        Result = SUCCESS
        sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1888 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1888/ ) HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261) Result = SUCCESS sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1819 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1819/)
        HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261)

        Result = ABORTED
        sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1819 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1819/ ) HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261) Result = ABORTED sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #971 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/971/)
        HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261)

        Result = SUCCESS
        sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #971 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/971/ ) HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261) Result = SUCCESS sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1006 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1006/)
        HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261)

        Result = SUCCESS
        sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1006 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1006/ ) HADOOP-7557 Make IPC header be extensible (sanjay radia) (Revision 1295261) Result = SUCCESS sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295261 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Commit #669 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/669/)
        HADOOP-7557. Merge r1295261 from trunk to 0.23 (Revision 1298245)

        Result = SUCCESS
        suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298245
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
        • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #669 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/669/ ) HADOOP-7557 . Merge r1295261 from trunk to 0.23 (Revision 1298245) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298245 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/dev-support/findbugsExcludeFile.xml /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ConnectionHeader.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/IpcException.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/proto/IpcConnectionContext.proto

          People

          • Assignee:
            Sanjay Radia
            Reporter:
            Sanjay Radia
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development