Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The most important place for wire-compatibility in DFS is between clients and the cluster, since lockstep upgrade is very difficult and a single client may want to talk to multiple server versions. So, I'd like to focus this JIRA on making the RPCs between the DFS client and the NN/DNs wire-compatible using protocol buffer based serialization.

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Todd Lipcon added a comment -

          Here's my proposal:

          • in a HADOOP jira, modify ObjectWritable to be able to serialize protocol buffers (ie the Message) type. This allows coexistence of writables and protobufs within a single protocol so we can do a staged switchover without a wholesale switch in rpc engine implementation.
          • for each of the RPCs in ClientProtocol:
            • define a new protobuf for the request params and the resposne params. EG SetReplicationRequestProto and SetReplicationResponseProto.
            • add a new RPC SetReplicationResponseProto setReplication(SetReplicationResponseProto param)
            • implement that RPC by simply writing the boilerplate wrap/unwrap code to/from the original parameters
            • deprecate the old call
            • switch callers internal to HDFS to the new proto-based call

          The nice thing about this approach is that it (a) maintains API backwards-compatibility for one version for folks directly using protocol proxies to talk to the NN, and (b) allows us to do a staged switchover, one RPC at a time.

          It doesn't address the ability to evolve the underlying IPC/RPC transport, but that could be tackled in parallel or sequenced after this.

          Show
          Todd Lipcon added a comment - Here's my proposal: in a HADOOP jira, modify ObjectWritable to be able to serialize protocol buffers (ie the Message ) type. This allows coexistence of writables and protobufs within a single protocol so we can do a staged switchover without a wholesale switch in rpc engine implementation. for each of the RPCs in ClientProtocol: define a new protobuf for the request params and the resposne params. EG SetReplicationRequestProto and SetReplicationResponseProto . add a new RPC SetReplicationResponseProto setReplication(SetReplicationResponseProto param) implement that RPC by simply writing the boilerplate wrap/unwrap code to/from the original parameters deprecate the old call switch callers internal to HDFS to the new proto-based call The nice thing about this approach is that it (a) maintains API backwards-compatibility for one version for folks directly using protocol proxies to talk to the NN, and (b) allows us to do a staged switchover, one RPC at a time. It doesn't address the ability to evolve the underlying IPC/RPC transport, but that could be tackled in parallel or sequenced after this.
          Hide
          Todd Lipcon added a comment -

          Just spent a few minutes and did getBlockLocations() as an example.

          Does this seem reasonable to folks?

          Show
          Todd Lipcon added a comment - Just spent a few minutes and did getBlockLocations() as an example. Does this seem reasonable to folks?
          Hide
          Todd Lipcon added a comment -

          We had a bit of discussion about this at the contributors meeting a few weeks ago (the week of the summit). My takeaways from that meeting were:

          • Several people expressed an opinion that it would be nicer to not have protobuf-specific code in any HDFS classes. Sidd described the approach used in MR2. If I understood him correctly, it uses a class structure like:
          interface FooWireType {
            long getBlah();
            void setBlah(long x);
            ... getters and setters ...
            ... serialization/deseriailization stuff?...
          }
          
          class FooWireTypeProtoImpl implements FooWireType {
            // wraps FooWireProto, which is the generated class
          }
          
          interface WireTypeFactory {
            FooWireType createFooType();
            BarWireType createBarWireType();
          }
          
          class WireTypeProtoFactory implements WireTypeFactory {
            // returns *ProtoImpl implementations
          }
          

          The upside of this approach is that it would be possible to switch serialization mechanisms (eg to avro or thrift) without changing any of the code in the DFS layer – just need to implement a different WireTypeFactory. The downside of this approach is that it requires a bunch of boilerplate interfaces and classes to be constructed. It would be possible to do this via code-gen, but no one has a working code generator at this point.

          • I argued that, while the above is nicer, it would be more expedient in the short term to just implement this based on protobufs. I already summarized my reasoning in this comment. The one-sentence version is that we need to move forward ASAP on this, and having something that works now is better than taking months to do something slightly more general.

          So, I would like to propose moving forward with the approach I outlined in this JIRA and the demonstration patch. I can commit time to doing this. If others find the approach unsatisfactory and can commit time to doing the more general mechanism on trunk in the short term, that would be great, but I don't want to put off client compatibility much longer. I also don't think we should move forward with the general mechanism until we have a reasonable code-gen infrastructure ready – it's just too much boilerplate to write and maintain.

          Show
          Todd Lipcon added a comment - We had a bit of discussion about this at the contributors meeting a few weeks ago (the week of the summit). My takeaways from that meeting were: Several people expressed an opinion that it would be nicer to not have protobuf-specific code in any HDFS classes. Sidd described the approach used in MR2. If I understood him correctly, it uses a class structure like: interface FooWireType { long getBlah(); void setBlah( long x); ... getters and setters ... ... serialization/deseriailization stuff?... } class FooWireTypeProtoImpl implements FooWireType { // wraps FooWireProto, which is the generated class } interface WireTypeFactory { FooWireType createFooType(); BarWireType createBarWireType(); } class WireTypeProtoFactory implements WireTypeFactory { // returns *ProtoImpl implementations } The upside of this approach is that it would be possible to switch serialization mechanisms (eg to avro or thrift) without changing any of the code in the DFS layer – just need to implement a different WireTypeFactory. The downside of this approach is that it requires a bunch of boilerplate interfaces and classes to be constructed. It would be possible to do this via code-gen, but no one has a working code generator at this point. I argued that, while the above is nicer, it would be more expedient in the short term to just implement this based on protobufs. I already summarized my reasoning in this comment . The one-sentence version is that we need to move forward ASAP on this, and having something that works now is better than taking months to do something slightly more general. So, I would like to propose moving forward with the approach I outlined in this JIRA and the demonstration patch. I can commit time to doing this. If others find the approach unsatisfactory and can commit time to doing the more general mechanism on trunk in the short term, that would be great, but I don't want to put off client compatibility much longer. I also don't think we should move forward with the general mechanism until we have a reasonable code-gen infrastructure ready – it's just too much boilerplate to write and maintain.
          Hide
          Arun C Murthy added a comment -

          Todd, the advantage of the approach in MAPREDUCE-279 is not merely pluggability - in fact, pluggability is merely a side-benefit.

          Architecturally the biggest benefit is that it provides a clean separation between on-wire and in-memory types. I'd argue having them both be the same is the easiest way to break compatibility, not just 'serialization' or 'protocol' compatibility, but semantic compatibility. It's too easy to add a field and have it show up in the server/client, thus my suggestion to separate the on-wire and in-memory types ala MAPREDUCE-279.

          Plus, this provides the added benefit of not being too tied to a particular serialization framework's idiosyncrasies.

          Show
          Arun C Murthy added a comment - Todd, the advantage of the approach in MAPREDUCE-279 is not merely pluggability - in fact, pluggability is merely a side-benefit. Architecturally the biggest benefit is that it provides a clean separation between on-wire and in-memory types. I'd argue having them both be the same is the easiest way to break compatibility, not just 'serialization' or 'protocol' compatibility, but semantic compatibility. It's too easy to add a field and have it show up in the server/client, thus my suggestion to separate the on-wire and in-memory types ala MAPREDUCE-279 . Plus, this provides the added benefit of not being too tied to a particular serialization framework's idiosyncrasies.
          Hide
          Allen Wittenauer added a comment -

          Are protobufs endian-aware? In other words, if I have a SPARC NN and Intel DNs, will everything work or will it fail like it does now? (Having a non-Intel box for the NN is highly desirable due to TLB issues...)

          Show
          Allen Wittenauer added a comment - Are protobufs endian-aware? In other words, if I have a SPARC NN and Intel DNs, will everything work or will it fail like it does now? (Having a non-Intel box for the NN is highly desirable due to TLB issues...)
          Hide
          Todd Lipcon added a comment -

          Yes, the protobuf code is careful to convert endianness where necessary. Re TLB issues, don't recent versions of Java use HugePages when running on recent Linux that provide transparent huge page support? That'll increase TLB hit rate significant.

          Show
          Todd Lipcon added a comment - Yes, the protobuf code is careful to convert endianness where necessary. Re TLB issues, don't recent versions of Java use HugePages when running on recent Linux that provide transparent huge page support? That'll increase TLB hit rate significant.
          Hide
          Allen Wittenauer added a comment -

          This isn't a Linux thing. It's a processor thing. Large (or Huge) pages on most Intel CPUs are around 2MB. Something like an UltraSPARC and Itanium can do 256MB. Power5's can do crazy things like 16GB.

          Show
          Allen Wittenauer added a comment - This isn't a Linux thing. It's a processor thing. Large (or Huge) pages on most Intel CPUs are around 2MB. Something like an UltraSPARC and Itanium can do 256MB. Power5's can do crazy things like 16GB.
          Hide
          Doug Cutting added a comment -

          I agree that the MR2-style protocol wrappers are a good pattern, but I'm not convinced they're required before this could be committed. The current Writable-based RPC implementation is not wrapped, nor does it provide version interoperability. This patch addresses version-interoperability but not wrapping. Wrapping seems like it might be handled as an independent issue, especially if version-interoperability is a priority in the near term.

          Show
          Doug Cutting added a comment - I agree that the MR2-style protocol wrappers are a good pattern, but I'm not convinced they're required before this could be committed. The current Writable-based RPC implementation is not wrapped, nor does it provide version interoperability. This patch addresses version-interoperability but not wrapping. Wrapping seems like it might be handled as an independent issue, especially if version-interoperability is a priority in the near term.
          Hide
          Todd Lipcon added a comment -

          This was fixed by other work to move RPC to protobuf

          Show
          Todd Lipcon added a comment - This was fixed by other work to move RPC to protobuf

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              2 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development