Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The most important place for wire-compatibility in DFS is between clients and the cluster, since lockstep upgrade is very difficult and a single client may want to talk to multiple server versions. So, I'd like to focus this JIRA on making the RPCs between the DFS client and the NN/DNs wire-compatible using protocol buffer based serialization.

        Issue Links

        1.
        ClientNameNodeProtocol in Protocol Buffers Sub-task Resolved Sanjay Radia
         
        2.
        ClientDatanodeProtocol in Protocol Buffers Sub-task Resolved Unassigned
         
        3.
        ClientNameNodeProtocol Translators for Protocol Buffers Sub-task Resolved Sanjay Radia
         
        4.
        Turn off TestDFSOverAvroRpc Sub-task Resolved Suresh Srinivas
         
        5.
        Implement protobuf service for NamenodeProtocol Sub-task Resolved Suresh Srinivas
         
        6.
        Implement protobuf service for JournalProtocol Sub-task Resolved Suresh Srinivas
         
        7.
        Implement protobuf service for InterDatanodeProtocol Sub-task Resolved Suresh Srinivas
         
        8.
        Implement protobuf service for ClientDatanodeProtocol Sub-task Resolved Suresh Srinivas
         
        9.
        Protobuf - Add protobuf service for DatanodeProtocol Sub-task Resolved Suresh Srinivas
         
        10.
        Protobuf - Add protobuf service for NamenodeProtocol Sub-task Resolved Suresh Srinivas
         
        11.
        Protobuf - Add protocol service for JournalProtocol Sub-task Resolved Suresh Srinivas
         
        12.
        Protobuf - Add protobuf service for InterDatanodeProtocol Sub-task Resolved Suresh Srinivas
         
        13.
        Enable protobuf RPC for InterDatanodeProtocol, ClientDatanodeProtocol, JournalProtocol and NamenodeProtocol Sub-task Resolved Suresh Srinivas
         
        14.
        Enable protobuf RPC for DatanodeProtocol Sub-task Resolved Jitendra Nath Pandey
         
        15.
        Optional parameters are not handled correctly Sub-task Resolved Suresh Srinivas
         
        16.
        Remove TestDFSOverAvroRpc Sub-task Resolved Suresh Srinivas
         
        17.
        Protobuf translators for DatanodeProtocol Sub-task Resolved Jitendra Nath Pandey
         
        18.
        TestBackupNode fails Sub-task Resolved Suresh Srinivas
         
        19.
        Enable protobuf rpc for ClientNamenodeProtocol Sub-task Resolved Sanjay Radia
         
        20.
        Move RefreshAuthPolicy, RefreshUserMappings, GetUserMappings protocol to protocol buffers Sub-task Resolved Jitendra Nath Pandey
         
        21.
        Tests are failing with ClassCastException, due to new protocol changes Sub-task Resolved Suresh Srinivas
         
        22.
        Fix RPC client creation bug from HDFS-2459 Sub-task Resolved Suresh Srinivas
         
        23.
        Provide a method in client side translators to check for a methods supported in underlying protocol. Sub-task Resolved Jitendra Nath Pandey
         

          Activity

          Hide
          Todd Lipcon added a comment -

          This was fixed by other work to move RPC to protobuf

          Show
          Todd Lipcon added a comment - This was fixed by other work to move RPC to protobuf
          Hide
          Doug Cutting added a comment -

          I agree that the MR2-style protocol wrappers are a good pattern, but I'm not convinced they're required before this could be committed. The current Writable-based RPC implementation is not wrapped, nor does it provide version interoperability. This patch addresses version-interoperability but not wrapping. Wrapping seems like it might be handled as an independent issue, especially if version-interoperability is a priority in the near term.

          Show
          Doug Cutting added a comment - I agree that the MR2-style protocol wrappers are a good pattern, but I'm not convinced they're required before this could be committed. The current Writable-based RPC implementation is not wrapped, nor does it provide version interoperability. This patch addresses version-interoperability but not wrapping. Wrapping seems like it might be handled as an independent issue, especially if version-interoperability is a priority in the near term.
          Hide
          Allen Wittenauer added a comment -

          This isn't a Linux thing. It's a processor thing. Large (or Huge) pages on most Intel CPUs are around 2MB. Something like an UltraSPARC and Itanium can do 256MB. Power5's can do crazy things like 16GB.

          Show
          Allen Wittenauer added a comment - This isn't a Linux thing. It's a processor thing. Large (or Huge) pages on most Intel CPUs are around 2MB. Something like an UltraSPARC and Itanium can do 256MB. Power5's can do crazy things like 16GB.
          Hide
          Todd Lipcon added a comment -

          Yes, the protobuf code is careful to convert endianness where necessary. Re TLB issues, don't recent versions of Java use HugePages when running on recent Linux that provide transparent huge page support? That'll increase TLB hit rate significant.

          Show
          Todd Lipcon added a comment - Yes, the protobuf code is careful to convert endianness where necessary. Re TLB issues, don't recent versions of Java use HugePages when running on recent Linux that provide transparent huge page support? That'll increase TLB hit rate significant.
          Hide
          Allen Wittenauer added a comment -

          Are protobufs endian-aware? In other words, if I have a SPARC NN and Intel DNs, will everything work or will it fail like it does now? (Having a non-Intel box for the NN is highly desirable due to TLB issues...)

          Show
          Allen Wittenauer added a comment - Are protobufs endian-aware? In other words, if I have a SPARC NN and Intel DNs, will everything work or will it fail like it does now? (Having a non-Intel box for the NN is highly desirable due to TLB issues...)
          Hide
          Arun C Murthy added a comment -

          Todd, the advantage of the approach in MAPREDUCE-279 is not merely pluggability - in fact, pluggability is merely a side-benefit.

          Architecturally the biggest benefit is that it provides a clean separation between on-wire and in-memory types. I'd argue having them both be the same is the easiest way to break compatibility, not just 'serialization' or 'protocol' compatibility, but semantic compatibility. It's too easy to add a field and have it show up in the server/client, thus my suggestion to separate the on-wire and in-memory types ala MAPREDUCE-279.

          Plus, this provides the added benefit of not being too tied to a particular serialization framework's idiosyncrasies.

          Show
          Arun C Murthy added a comment - Todd, the advantage of the approach in MAPREDUCE-279 is not merely pluggability - in fact, pluggability is merely a side-benefit. Architecturally the biggest benefit is that it provides a clean separation between on-wire and in-memory types. I'd argue having them both be the same is the easiest way to break compatibility, not just 'serialization' or 'protocol' compatibility, but semantic compatibility. It's too easy to add a field and have it show up in the server/client, thus my suggestion to separate the on-wire and in-memory types ala MAPREDUCE-279 . Plus, this provides the added benefit of not being too tied to a particular serialization framework's idiosyncrasies.
          Hide
          Todd Lipcon added a comment -

          We had a bit of discussion about this at the contributors meeting a few weeks ago (the week of the summit). My takeaways from that meeting were:

          • Several people expressed an opinion that it would be nicer to not have protobuf-specific code in any HDFS classes. Sidd described the approach used in MR2. If I understood him correctly, it uses a class structure like:
          interface FooWireType {
            long getBlah();
            void setBlah(long x);
            ... getters and setters ...
            ... serialization/deseriailization stuff?...
          }
          
          class FooWireTypeProtoImpl implements FooWireType {
            // wraps FooWireProto, which is the generated class
          }
          
          interface WireTypeFactory {
            FooWireType createFooType();
            BarWireType createBarWireType();
          }
          
          class WireTypeProtoFactory implements WireTypeFactory {
            // returns *ProtoImpl implementations
          }
          

          The upside of this approach is that it would be possible to switch serialization mechanisms (eg to avro or thrift) without changing any of the code in the DFS layer – just need to implement a different WireTypeFactory. The downside of this approach is that it requires a bunch of boilerplate interfaces and classes to be constructed. It would be possible to do this via code-gen, but no one has a working code generator at this point.

          • I argued that, while the above is nicer, it would be more expedient in the short term to just implement this based on protobufs. I already summarized my reasoning in this comment. The one-sentence version is that we need to move forward ASAP on this, and having something that works now is better than taking months to do something slightly more general.

          So, I would like to propose moving forward with the approach I outlined in this JIRA and the demonstration patch. I can commit time to doing this. If others find the approach unsatisfactory and can commit time to doing the more general mechanism on trunk in the short term, that would be great, but I don't want to put off client compatibility much longer. I also don't think we should move forward with the general mechanism until we have a reasonable code-gen infrastructure ready – it's just too much boilerplate to write and maintain.

          Show
          Todd Lipcon added a comment - We had a bit of discussion about this at the contributors meeting a few weeks ago (the week of the summit). My takeaways from that meeting were: Several people expressed an opinion that it would be nicer to not have protobuf-specific code in any HDFS classes. Sidd described the approach used in MR2. If I understood him correctly, it uses a class structure like: interface FooWireType { long getBlah(); void setBlah( long x); ... getters and setters ... ... serialization/deseriailization stuff?... } class FooWireTypeProtoImpl implements FooWireType { // wraps FooWireProto, which is the generated class } interface WireTypeFactory { FooWireType createFooType(); BarWireType createBarWireType(); } class WireTypeProtoFactory implements WireTypeFactory { // returns *ProtoImpl implementations } The upside of this approach is that it would be possible to switch serialization mechanisms (eg to avro or thrift) without changing any of the code in the DFS layer – just need to implement a different WireTypeFactory. The downside of this approach is that it requires a bunch of boilerplate interfaces and classes to be constructed. It would be possible to do this via code-gen, but no one has a working code generator at this point. I argued that, while the above is nicer, it would be more expedient in the short term to just implement this based on protobufs. I already summarized my reasoning in this comment . The one-sentence version is that we need to move forward ASAP on this, and having something that works now is better than taking months to do something slightly more general. So, I would like to propose moving forward with the approach I outlined in this JIRA and the demonstration patch. I can commit time to doing this. If others find the approach unsatisfactory and can commit time to doing the more general mechanism on trunk in the short term, that would be great, but I don't want to put off client compatibility much longer. I also don't think we should move forward with the general mechanism until we have a reasonable code-gen infrastructure ready – it's just too much boilerplate to write and maintain.
          Hide
          Todd Lipcon added a comment -

          Just spent a few minutes and did getBlockLocations() as an example.

          Does this seem reasonable to folks?

          Show
          Todd Lipcon added a comment - Just spent a few minutes and did getBlockLocations() as an example. Does this seem reasonable to folks?
          Hide
          Todd Lipcon added a comment -

          Here's my proposal:

          • in a HADOOP jira, modify ObjectWritable to be able to serialize protocol buffers (ie the Message) type. This allows coexistence of writables and protobufs within a single protocol so we can do a staged switchover without a wholesale switch in rpc engine implementation.
          • for each of the RPCs in ClientProtocol:
            • define a new protobuf for the request params and the resposne params. EG SetReplicationRequestProto and SetReplicationResponseProto.
            • add a new RPC SetReplicationResponseProto setReplication(SetReplicationResponseProto param)
            • implement that RPC by simply writing the boilerplate wrap/unwrap code to/from the original parameters
            • deprecate the old call
            • switch callers internal to HDFS to the new proto-based call

          The nice thing about this approach is that it (a) maintains API backwards-compatibility for one version for folks directly using protocol proxies to talk to the NN, and (b) allows us to do a staged switchover, one RPC at a time.

          It doesn't address the ability to evolve the underlying IPC/RPC transport, but that could be tackled in parallel or sequenced after this.

          Show
          Todd Lipcon added a comment - Here's my proposal: in a HADOOP jira, modify ObjectWritable to be able to serialize protocol buffers (ie the Message ) type. This allows coexistence of writables and protobufs within a single protocol so we can do a staged switchover without a wholesale switch in rpc engine implementation. for each of the RPCs in ClientProtocol: define a new protobuf for the request params and the resposne params. EG SetReplicationRequestProto and SetReplicationResponseProto . add a new RPC SetReplicationResponseProto setReplication(SetReplicationResponseProto param) implement that RPC by simply writing the boilerplate wrap/unwrap code to/from the original parameters deprecate the old call switch callers internal to HDFS to the new proto-based call The nice thing about this approach is that it (a) maintains API backwards-compatibility for one version for folks directly using protocol proxies to talk to the NN, and (b) allows us to do a staged switchover, one RPC at a time. It doesn't address the ability to evolve the underlying IPC/RPC transport, but that could be tackled in parallel or sequenced after this.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              2 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development