Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ipc
    • Labels:
      None

      Description

      This is an umbrella issue for moving HDFS and MapReduce RPC to use Avro.

        Issue Links

          Activity

          Hide
          Jeff Hammerbacher added a comment -

          Hey Doug,

          A design document and test plan would be quite interesting for this JIRA. Even a few paragraphs about which protocols you plan to move over to Avro and what is missing from Avro for each would be excellent.

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey Doug, A design document and test plan would be quite interesting for this JIRA. Even a few paragraphs about which protocols you plan to move over to Avro and what is missing from Avro for each would be excellent. Thanks, Jeff
          Hide
          Doug Cutting added a comment -

          Avro RPC will enhance Hadoop in two primary areas:

          • versioning: Hadoop's current RPC mechanism requires that clients and servers evolve in lockstep. Avro RPC permits clients and servers using different versions of protocols to still communicate. Avro doesn't automatically solve all versioning problems, but it does give one considerably better tools to solve them.
          • language-independence: Hadoop's RPC is currently Java-only, while Avro supports Java, Python, Ruby, C and C++ so far. A long-term goal for Hadoop is to permit native implementations of certain RPC protocols in other languages, e.g., perhaps mapreduce job submission and hdfs clients.

          The intended approach is to initially use Avro's reflection API tunneled over Hadoop RPC:

          • Reflection is used initially to minimize the impact on the codebase. Subsequently we should consider, protocol-by-protocol, switching to IDL-driven protocols, generating Java APIs using Avro specific.
          • Tunneling is used initially to leverage Hadoop's existing high-performance, secure RPC transport. Once Avro has an equivalent (AVRO-341) we can consider switching Hadop to use to that, to achieve language-independence.

          Once reflection works correctly then we will have an Avro-based RPC protocol specification that correclty models Hadoop's protocols. This alone would address Hadoop's versioning issues, but it would not address the language-interoperability, since the Hadoop RPC tunnel used is not a cross-language standard. For language interoperability, we need AVRO-341.

          With HADOOP-6422 and HDFS-892, HDFS tests can now be run using Avro reflection. Some basic tests pass but many still fail. These failures will be addressed in HDFS-1066.

          Show
          Doug Cutting added a comment - Avro RPC will enhance Hadoop in two primary areas: versioning: Hadoop's current RPC mechanism requires that clients and servers evolve in lockstep. Avro RPC permits clients and servers using different versions of protocols to still communicate. Avro doesn't automatically solve all versioning problems, but it does give one considerably better tools to solve them. language-independence: Hadoop's RPC is currently Java-only, while Avro supports Java, Python, Ruby, C and C++ so far. A long-term goal for Hadoop is to permit native implementations of certain RPC protocols in other languages, e.g., perhaps mapreduce job submission and hdfs clients. The intended approach is to initially use Avro's reflection API tunneled over Hadoop RPC: Reflection is used initially to minimize the impact on the codebase. Subsequently we should consider, protocol-by-protocol, switching to IDL-driven protocols, generating Java APIs using Avro specific. Tunneling is used initially to leverage Hadoop's existing high-performance, secure RPC transport. Once Avro has an equivalent ( AVRO-341 ) we can consider switching Hadop to use to that, to achieve language-independence. Once reflection works correctly then we will have an Avro-based RPC protocol specification that correclty models Hadoop's protocols. This alone would address Hadoop's versioning issues, but it would not address the language-interoperability, since the Hadoop RPC tunnel used is not a cross-language standard. For language interoperability, we need AVRO-341 . With HADOOP-6422 and HDFS-892 , HDFS tests can now be run using Avro reflection. Some basic tests pass but many still fail. These failures will be addressed in HDFS-1066 .
          Hide
          Sanjay Radia added a comment -

          A design document as Jeff suggested would be useful.

          I assume that the goal is still to make the RPC completely pulggable. i.e Till the AVRO path is stable, the default config uses the Hadoop RPC with the old serialization and that there is no -ve performance impact on the
          old code paths.

          Show
          Sanjay Radia added a comment - A design document as Jeff suggested would be useful. I assume that the goal is still to make the RPC completely pulggable. i.e Till the AVRO path is stable, the default config uses the Hadoop RPC with the old serialization and that there is no -ve performance impact on the old code paths.
          Hide
          Doug Cutting added a comment -

          > A design document as Jeff suggested would be useful.

          What's unspecified in current issues? If you like I can collate the comments on the various issues linked here into a single document if that would make them more readable for you.

          > I assume that the goal is still to make the RPC completely pluggable. i.e 'til the AVRO path is stable, the default config uses the Hadoop RPC with the old serialization and that there is no -ve performance impact on the
          old code paths.

          RPC is already pluggable w/o negative performance impact. That was done in HADOOP-6422.

          I don't expect that switching to Avro serialization for RPCs will affect performance, but that should certainly be tested before we make Avro serialization the default. Switching the transport is more likely to affect performance, but that switch can be made separately and after switching serializations.

          So I see something like the following steps:

          • HDFS
            • get hdfs tests to pass using Avro RPC serialization
            • test hdfs performance using Avro RPC serialization
            • switch HDFS to use Avro RPC serialization by default
            • design, implement and switch HDFS to use IDL-driven Avro RPC (HDFS-1069)
          • MapReduce
            • get mapreduce tests to pass using Avro RPC serialization
            • test mapreduce performance using Avro RPC serialization
            • switch Mapreduce to use Avro RPC serialization by default
            • design, implement and switch MapReduce to use IDL-driven Avro RPC (HDFS-1069)
          • Transport
            • Design and develop an interoperable, secure, high-performance Avro transport (AVRO-341)
            • port HDFS and MapReduce to use this optionally
            • test HDFS and mapreduce with this new transport
            • switch HDFS and Mapreduce to use new transport by default

          Would it be useful to file a Jira issue for each of these?

          Show
          Doug Cutting added a comment - > A design document as Jeff suggested would be useful. What's unspecified in current issues? If you like I can collate the comments on the various issues linked here into a single document if that would make them more readable for you. > I assume that the goal is still to make the RPC completely pluggable. i.e 'til the AVRO path is stable, the default config uses the Hadoop RPC with the old serialization and that there is no -ve performance impact on the old code paths. RPC is already pluggable w/o negative performance impact. That was done in HADOOP-6422 . I don't expect that switching to Avro serialization for RPCs will affect performance, but that should certainly be tested before we make Avro serialization the default. Switching the transport is more likely to affect performance, but that switch can be made separately and after switching serializations. So I see something like the following steps: HDFS get hdfs tests to pass using Avro RPC serialization test hdfs performance using Avro RPC serialization switch HDFS to use Avro RPC serialization by default design, implement and switch HDFS to use IDL-driven Avro RPC ( HDFS-1069 ) MapReduce get mapreduce tests to pass using Avro RPC serialization test mapreduce performance using Avro RPC serialization switch Mapreduce to use Avro RPC serialization by default design, implement and switch MapReduce to use IDL-driven Avro RPC ( HDFS-1069 ) Transport Design and develop an interoperable, secure, high-performance Avro transport ( AVRO-341 ) port HDFS and MapReduce to use this optionally test HDFS and mapreduce with this new transport switch HDFS and Mapreduce to use new transport by default Would it be useful to file a Jira issue for each of these?
          Hide
          eric baldeschwieler added a comment -

          Hi Doug,

          -1

          The RPC system is fundamental to hadoop's stability. Until your proposal has been formally documented, including an IDL, which I think is critical to the stated goal of backward compatibility, I'm not going to be comfortable seeing it committed. Beyond understanding the design, you need a testing plan and test results to review. Without this step, you will be throwing a huge tax onto others in the community. We have the downside of all this change and have to work hard to debug your work, yet we see no benefit.

          If you want to perform radical surgery on Hadoop, you need to convince the community that taking your work is worth the risk you are imposing on us. Bugs in Hadoop can take down businesses. In yahoo's case, they can cost us millions of dollars in lost revenue.

          Perhaps you can proceed as you suggest on a branch where you will not disrupt other work or impose an unfair testing burden on the project?

          E14

          Show
          eric baldeschwieler added a comment - Hi Doug, -1 The RPC system is fundamental to hadoop's stability. Until your proposal has been formally documented, including an IDL, which I think is critical to the stated goal of backward compatibility, I'm not going to be comfortable seeing it committed. Beyond understanding the design, you need a testing plan and test results to review. Without this step, you will be throwing a huge tax onto others in the community. We have the downside of all this change and have to work hard to debug your work, yet we see no benefit. If you want to perform radical surgery on Hadoop, you need to convince the community that taking your work is worth the risk you are imposing on us. Bugs in Hadoop can take down businesses. In yahoo's case, they can cost us millions of dollars in lost revenue. Perhaps you can proceed as you suggest on a branch where you will not disrupt other work or impose an unfair testing burden on the project? E14
          Hide
          Doug Cutting added a comment -

          Eric, this work is proceeding using a "branch in place" strategy, since long-lived branches are onerous to mange. The intent is to be largely equivalent to a branch, in that existing code will not be disturbed. The first step was to make the RPC functionality we wish to replace pluggable. That was committed last September in HADOOP-6170. Now we can develop and test the new RPC implementation without disturbing the existing implementation. There is no intent to switch to the new implementation until all concerns about it are resolved. Some minor, non-functional changes have and will be made to the existing implementation to aid IDL inference, i.e. adding some annotations and renaming some private methods in (HDFS-1077).

          > Until your proposal has been formally documented, including an IDL, which I think is critical to the stated goal of backward compatibility, I'm not going to be comfortable seeing it committed.

          This issue does not attempt to fully implement cross-version wire-compatiblity. Nor does it attempt to move Hadoop to IDL-driven RPC. Rather it proposes to move Hadoop to a system that we can use to implement wire-compatibility and IDL-driven RPC.

          Until we switch, we cannot dictate the IDL, the IDL is derived from the existing Java interfaces used to define RPC protocols. Once we've switched to the new Avro-based implementation, then we can elect to, protocol-by-protocol, client-by-client and server-by-server, switch to an IDL-driven implementation. (An IDL derived by the current Java interfaces is attached to HDFS-1069. This is in JSON form, but Avro has an alternate IDL syntax that, by the time we consider switching, would be used instead.)

          Changes to the protocol often imply changes to client and server logic. This issue does not propose to alter client and server logic, but rather to track and retain the existing logic and protocols.

          Before we make a release that we claim will support wire-compatibility we should certainly review our protocols carefully. But first we need to switch an RPC system that permits clients and servers to use different protocol versions and that supports IDL-driven protocols. That's the primary focus of this issue. Once we switch, our wire-compatibility problems will not be automatically solved, rather we'll then have tools in place to address them.

          > Beyond understanding the design, you need a testing plan and test results to review.

          Yes, I agree, this should be fully tested to everyone's satisfaction before we make any switch.

          Show
          Doug Cutting added a comment - Eric, this work is proceeding using a "branch in place" strategy, since long-lived branches are onerous to mange. The intent is to be largely equivalent to a branch, in that existing code will not be disturbed. The first step was to make the RPC functionality we wish to replace pluggable. That was committed last September in HADOOP-6170 . Now we can develop and test the new RPC implementation without disturbing the existing implementation. There is no intent to switch to the new implementation until all concerns about it are resolved. Some minor, non-functional changes have and will be made to the existing implementation to aid IDL inference, i.e. adding some annotations and renaming some private methods in ( HDFS-1077 ). > Until your proposal has been formally documented, including an IDL, which I think is critical to the stated goal of backward compatibility, I'm not going to be comfortable seeing it committed. This issue does not attempt to fully implement cross-version wire-compatiblity. Nor does it attempt to move Hadoop to IDL-driven RPC. Rather it proposes to move Hadoop to a system that we can use to implement wire-compatibility and IDL-driven RPC. Until we switch, we cannot dictate the IDL, the IDL is derived from the existing Java interfaces used to define RPC protocols. Once we've switched to the new Avro-based implementation, then we can elect to, protocol-by-protocol, client-by-client and server-by-server, switch to an IDL-driven implementation. (An IDL derived by the current Java interfaces is attached to HDFS-1069 . This is in JSON form, but Avro has an alternate IDL syntax that, by the time we consider switching, would be used instead.) Changes to the protocol often imply changes to client and server logic. This issue does not propose to alter client and server logic, but rather to track and retain the existing logic and protocols. Before we make a release that we claim will support wire-compatibility we should certainly review our protocols carefully. But first we need to switch an RPC system that permits clients and servers to use different protocol versions and that supports IDL-driven protocols. That's the primary focus of this issue. Once we switch, our wire-compatibility problems will not be automatically solved, rather we'll then have tools in place to address them. > Beyond understanding the design, you need a testing plan and test results to review. Yes, I agree, this should be fully tested to everyone's satisfaction before we make any switch.
          Hide
          eric baldeschwieler added a comment -

          Hi Doug,

          Thanks, I think I understand your "branch in place" proposal. I think you need to understand my position. I'm responsible for making sure the hadoop code base is stable enough that we can run our production work on it. If we move incrementally as you propose, my customers will be exposed to a lot of risk and disruption for no return. Yahoo QA staff and developers will be pulled off what they are working on to complete this project. That is not how we want to allocate our resources.

          That is why I'm rejecting your current proposal and suggesting that you instead do more work on an actual branch, including getting the IDL based RPC scheme coded. Once you build something that has actual return for my customers, than we will be happy to help with the final debug and tuning, but first I would like to see you complete the project. Only then do I want to see something that effects every single RPC call incorporated into the code base.

          I understand that this is not how you would like to proceed and I understand that this is not historically how you would have proceeded with a project like this in Hadoop. But, as Hadoop evolves and the number of business built on it increase, the way we approach radical change in the code base needs to change as well. I assert that to proceed with something as radical as this, you need the consensus of Hadoop committers behind you. You do not have it.

          Please understand that I am supportive of your desire to incorporate an AVRO based RPC into Hadoop. I just feel that you need to take a more cautious approach. Otherwise you tax the entire community and I don't support that.

          Thanks!

          Show
          eric baldeschwieler added a comment - Hi Doug, Thanks, I think I understand your "branch in place" proposal. I think you need to understand my position. I'm responsible for making sure the hadoop code base is stable enough that we can run our production work on it. If we move incrementally as you propose, my customers will be exposed to a lot of risk and disruption for no return. Yahoo QA staff and developers will be pulled off what they are working on to complete this project. That is not how we want to allocate our resources. That is why I'm rejecting your current proposal and suggesting that you instead do more work on an actual branch, including getting the IDL based RPC scheme coded. Once you build something that has actual return for my customers, than we will be happy to help with the final debug and tuning, but first I would like to see you complete the project. Only then do I want to see something that effects every single RPC call incorporated into the code base. I understand that this is not how you would like to proceed and I understand that this is not historically how you would have proceeded with a project like this in Hadoop. But, as Hadoop evolves and the number of business built on it increase, the way we approach radical change in the code base needs to change as well. I assert that to proceed with something as radical as this, you need the consensus of Hadoop committers behind you. You do not have it. Please understand that I am supportive of your desire to incorporate an AVRO based RPC into Hadoop. I just feel that you need to take a more cautious approach. Otherwise you tax the entire community and I don't support that. Thanks!
          Hide
          Doug Cutting added a comment -

          Eric, the fundamental change that adds "risk" has already been committed to trunk, in HADOOP-6422, permitting one to swap in a different RPC implementation via the configuration. After that, until we make the switch, no changes should be made that affect runtime behavior. So if you have a problem with that change, please have one of your committers veto and revert it and I will stop working on this.

          Show
          Doug Cutting added a comment - Eric, the fundamental change that adds "risk" has already been committed to trunk, in HADOOP-6422 , permitting one to swap in a different RPC implementation via the configuration. After that, until we make the switch, no changes should be made that affect runtime behavior. So if you have a problem with that change, please have one of your committers veto and revert it and I will stop working on this.
          Hide
          Sanjay Radia added a comment -

          While Avro reflection will make it easier to get AVRO into Hadoop's wire protocol, I believe that AVRO IDL-driven protocols are necessary for wire compatibility. Why?

          • A protocol needs to be designed to be compatible. Serialization technologies like PB and Avro allows one to add/delete fields easily; I like that, but it can be misleading. I don't think we designed the current Hadoop protocols carefully with an eye towards compatibility.
            • For example, as part of HDFS-1052 we would like to extend a the blockId to have an additional field.
              Avro would help us do that very easily, but it would not work unless the client side treats the blockid as an opaque object that is NOT deserialized and simply passed unintepreted to the DNs. I think there are many such examples.
          • RMI and Hadoop RPC make it too easy to pass any Java object that one is using internally across the wire. Avro using reflection will continue that. One needs to examine every object that is being passed across the wire decide if it necessary and what its type should be.
          • PB and Avro are very powerful and useful tools - unfortunately a Reflection based approach make badly designed protocols appear to be good because they give you the impression that your protocol is magically compatible; and it mostly does, but it can miss corner cases and encourages creating messy protocol that exposes too many types.

          Hence I propose that we take every single Hadoop protocol and design it for compatibility using Avro IDL. (HDFS-1069, MAPREDUCE-1689) Based on what I have read, I believe Doug is agreeing with the above.
          Switching to IDL however, is not a pluggable change - hence this part needs to be done in a branch.

          The question I have been struggling with is: what are benefits of the reflection scheme since I assert that the resulting protocols cannot be declared as "the hadoop wire-compatible protocols". The only benefit is that the reflection based protocols can be used for cross-language access to hadoop.

          Show
          Sanjay Radia added a comment - While Avro reflection will make it easier to get AVRO into Hadoop's wire protocol, I believe that AVRO IDL-driven protocols are necessary for wire compatibility. Why? A protocol needs to be designed to be compatible. Serialization technologies like PB and Avro allows one to add/delete fields easily; I like that, but it can be misleading. I don't think we designed the current Hadoop protocols carefully with an eye towards compatibility. For example, as part of HDFS-1052 we would like to extend a the blockId to have an additional field. Avro would help us do that very easily, but it would not work unless the client side treats the blockid as an opaque object that is NOT deserialized and simply passed unintepreted to the DNs. I think there are many such examples. RMI and Hadoop RPC make it too easy to pass any Java object that one is using internally across the wire. Avro using reflection will continue that. One needs to examine every object that is being passed across the wire decide if it necessary and what its type should be. PB and Avro are very powerful and useful tools - unfortunately a Reflection based approach make badly designed protocols appear to be good because they give you the impression that your protocol is magically compatible; and it mostly does, but it can miss corner cases and encourages creating messy protocol that exposes too many types. Hence I propose that we take every single Hadoop protocol and design it for compatibility using Avro IDL. ( HDFS-1069 , MAPREDUCE-1689 ) Based on what I have read, I believe Doug is agreeing with the above. Switching to IDL however, is not a pluggable change - hence this part needs to be done in a branch. The question I have been struggling with is: what are benefits of the reflection scheme since I assert that the resulting protocols cannot be declared as "the hadoop wire-compatible protocols". The only benefit is that the reflection based protocols can be used for cross-language access to hadoop.
          Hide
          Sanjay Radia added a comment -

          Doug says:
          > switch HDFS [and MR] to use Avro RPC serialization by default.
          Given that the resulting protocols are not the official wire compatible protocol, why change the default till we have completed the move to Avro IDL?

          Show
          Sanjay Radia added a comment - Doug says: > switch HDFS [and MR] to use Avro RPC serialization by default. Given that the resulting protocols are not the official wire compatible protocol, why change the default till we have completed the move to Avro IDL?
          Hide
          Doug Cutting added a comment -

          > AVRO IDL-driven protocols are necessary for wire compatibility.

          I agree. As I said above, before we declare that we support wire-compatibility we should perform a careful audit of our RPC protocols. With the approach I've suggested, this can largely be pursued in parallel. Until we switch we can examine the reflected protocol and work to improve (or at least develop a list of proposed improvements) the Java interfaces.

          > what are benefits of the reflection scheme

          By not branching we can continue development of HDFS and mapreduce in parallel while we develop and test a new RPC serialization and transport. As mentioned above, once we've switched to Avro-serialization using reflect, then we can begin, protocol-by-protocol, switching each to an IDL-based approach. Each protocol can be addressed in a separate issue with no massive branch required. We might, e.g., prioritize client-facing protocols first, so that we can support wire compatibility of clients before we support rolling cluster upgrades. We can even separate updating clients from updating servers. Once we've completed the transition to an IDL-driven system, then we can, protocol-by-protocol, method-by-method, work to improve the IDL to the point where we're willing to declare our support of wire-compatibility. At no point is trunk broken or are large areas blocked from changes and fixes.

          Show
          Doug Cutting added a comment - > AVRO IDL-driven protocols are necessary for wire compatibility. I agree. As I said above, before we declare that we support wire-compatibility we should perform a careful audit of our RPC protocols. With the approach I've suggested, this can largely be pursued in parallel. Until we switch we can examine the reflected protocol and work to improve (or at least develop a list of proposed improvements) the Java interfaces. > what are benefits of the reflection scheme By not branching we can continue development of HDFS and mapreduce in parallel while we develop and test a new RPC serialization and transport. As mentioned above, once we've switched to Avro-serialization using reflect, then we can begin, protocol-by-protocol, switching each to an IDL-based approach. Each protocol can be addressed in a separate issue with no massive branch required. We might, e.g., prioritize client-facing protocols first, so that we can support wire compatibility of clients before we support rolling cluster upgrades. We can even separate updating clients from updating servers. Once we've completed the transition to an IDL-driven system, then we can, protocol-by-protocol, method-by-method, work to improve the IDL to the point where we're willing to declare our support of wire-compatibility. At no point is trunk broken or are large areas blocked from changes and fixes.
          Hide
          Sanjay Radia added a comment -

          My viewpoint is in partial agreement with Doug and Eric:

          1. Implement the pluggable RPC in trunk and it needs to be well tested (agree with Doug)
          2. Implement the AVRO IDL based protocols in a branch(es). (agree with Eric).
          3. Only after step 2 do we declare the new protocols to be wire compatible in future. (we are all in agreement here)
          4. After step 1, leave the default to be current RPC - but any customer can easily change this by a config variable. (Disagree with Doug).

          There is low risk in #1 as it is pluggable.
          Moving to Avro IDL will change many data types in our servers and cannot be pluggable. This needs to be well tested (correctness and performance) and cannot cross releases; hence it is best done in a branch(es). HDFS and MR can be separate branches; however I don't think we can split the HDFS protocols as they use many common data types (block-id, block-token, etc).

          Show
          Sanjay Radia added a comment - My viewpoint is in partial agreement with Doug and Eric: Implement the pluggable RPC in trunk and it needs to be well tested (agree with Doug) Implement the AVRO IDL based protocols in a branch(es). (agree with Eric). Only after step 2 do we declare the new protocols to be wire compatible in future. (we are all in agreement here) After step 1, leave the default to be current RPC - but any customer can easily change this by a config variable. (Disagree with Doug). There is low risk in #1 as it is pluggable. Moving to Avro IDL will change many data types in our servers and cannot be pluggable. This needs to be well tested (correctness and performance) and cannot cross releases; hence it is best done in a branch(es). HDFS and MR can be separate branches; however I don't think we can split the HDFS protocols as they use many common data types (block-id, block-token, etc).
          Hide
          Doug Cutting added a comment -

          Sanjay, I'm pleased to see we have so few differences. I leave for a one week vacation tomorrow morning, and look forward to working with you more on this when I return.

          Show
          Doug Cutting added a comment - Sanjay, I'm pleased to see we have so few differences. I leave for a one week vacation tomorrow morning, and look forward to working with you more on this when I return.
          Hide
          Arun C Murthy added a comment -

          I think Sanjay's proposal is reasonable, and glad to see we are close to consensus on this important issue.

          Show
          Arun C Murthy added a comment - I think Sanjay's proposal is reasonable, and glad to see we are close to consensus on this important issue.
          Hide
          eric baldeschwieler added a comment -

          If we can get to agreement that the current RPC remains the default and that IDL work happens in a branch and that only after we have a reasonably complete backwards compat solution do we change to AVRO RPC by default, then I think we are good.

          My concern is not with the already committed plugability, but with the change of the default RPC and incremental change over of the protocols. Once that work starts, we are committed to finishing it and we can not do that in the timeframe of the next release IMO. I'd love to be proven wrong in a branch...

          Would we keep the plugable API once we make the transition? Seems like plugable support of an IDL driven RPC will be hard / not valuable. Thoughts?

          Show
          eric baldeschwieler added a comment - If we can get to agreement that the current RPC remains the default and that IDL work happens in a branch and that only after we have a reasonably complete backwards compat solution do we change to AVRO RPC by default, then I think we are good. My concern is not with the already committed plugability, but with the change of the default RPC and incremental change over of the protocols. Once that work starts, we are committed to finishing it and we can not do that in the timeframe of the next release IMO. I'd love to be proven wrong in a branch... Would we keep the plugable API once we make the transition? Seems like plugable support of an IDL driven RPC will be hard / not valuable. Thoughts?
          Hide
          Doug Cutting added a comment -

          Sanjay> Moving to Avro IDL will change many data types in our servers and cannot be pluggable.
          Sanjay> I don't think we can split the HDFS protocols as they use many common data types (block-id, block-token, etc).

          I agree that it may not be easy to do this incrementally, but I think it would be much easier than trying to do it in a branch. For example, existing datatypes like Block that are used in many protocols could be made to implement Avro's SpecificRecord interface so that both IDL and reflect-based code can use them. Then we could more easily consider porting protocols one-by-one. Once we've ported all protocols that use Block, then we could, as a single patch, replace it with an IDL-generated version.

          Eric> If we can get to agreement that the current RPC remains the default and that IDL work happens in a branch and that only after we have a reasonably complete backwards compat solution do we change to AVRO RPC by default, then I think we are good.

          If Avro reflect-based RPC passes all tests, including performance, reliability, etc., why wouldn't we switch to it? Even without using an IDL, this would let client and server versions differ. We certainly don't want to encourage independent 3rd party implementations of protocols until we're happy that the protocols are what we intend to support long-term, but I don't yet see a reason not to switch once Avro-based RPC is functionally equivalent.

          Eric> My concern is not with the already committed plugability, but with the change of the default RPC and incremental change over of the protocols. Once that work starts, we are committed to finishing it and we can not do that in the timeframe of the next release IMO.

          I don't follow. If we proceed incrementally, thoroughly testing changes before committing them, then we should be able to release at any time, no?

          Eric> Would we keep the plugable API once we make the transition?

          I see no reason to keep it.

          Folks are welcome to start IDL-based branch(es) if they prefer to operate that way. In that case, I will cease work on support for an incremental approach, as it would be redundant. I fear that, since such branches would be long-lived that they'll prove very painful to maintain, especially if they make substantial changes to protocols. We should not prohibit trunk changes to the HDFS and MapReduce protocols.

          Show
          Doug Cutting added a comment - Sanjay> Moving to Avro IDL will change many data types in our servers and cannot be pluggable. Sanjay> I don't think we can split the HDFS protocols as they use many common data types (block-id, block-token, etc). I agree that it may not be easy to do this incrementally, but I think it would be much easier than trying to do it in a branch. For example, existing datatypes like Block that are used in many protocols could be made to implement Avro's SpecificRecord interface so that both IDL and reflect-based code can use them. Then we could more easily consider porting protocols one-by-one. Once we've ported all protocols that use Block, then we could, as a single patch, replace it with an IDL-generated version. Eric> If we can get to agreement that the current RPC remains the default and that IDL work happens in a branch and that only after we have a reasonably complete backwards compat solution do we change to AVRO RPC by default, then I think we are good. If Avro reflect-based RPC passes all tests, including performance, reliability, etc., why wouldn't we switch to it? Even without using an IDL, this would let client and server versions differ. We certainly don't want to encourage independent 3rd party implementations of protocols until we're happy that the protocols are what we intend to support long-term, but I don't yet see a reason not to switch once Avro-based RPC is functionally equivalent. Eric> My concern is not with the already committed plugability, but with the change of the default RPC and incremental change over of the protocols. Once that work starts, we are committed to finishing it and we can not do that in the timeframe of the next release IMO. I don't follow. If we proceed incrementally, thoroughly testing changes before committing them, then we should be able to release at any time, no? Eric> Would we keep the plugable API once we make the transition? I see no reason to keep it. Folks are welcome to start IDL-based branch(es) if they prefer to operate that way. In that case, I will cease work on support for an incremental approach, as it would be redundant. I fear that, since such branches would be long-lived that they'll prove very painful to maintain, especially if they make substantial changes to protocols. We should not prohibit trunk changes to the HDFS and MapReduce protocols.
          Hide
          eric baldeschwieler added a comment -

          Hi Doug,

          I do not support an incremental approach of protocol porting. The idea
          of supporting parallel impementations of our current hdfs protocols is
          mind boggling, as you've told me several times during Hadoop's
          development.

          All of this change would be a huge tax on folks trying to stabilize
          Hadoop for release.

          I agree doing this in a branch would be a huge job. This project is a
          huge job. That is my concern. It's not fair on the community to
          implicitly sign us all up to finish something of this scale. If you
          are signing up to do the project, great! We will help you close once
          you can demonstrate you are close to the finish line.

          I don't support throwing the whole community into a state where we
          need to finish this to ship good releases before anyone understands
          how we are going to finish it or who will finish it or how much work
          it will be.

          Thanks for taking the time to understand my concerns!


          E14 - via iPhone

          On Apr 22, 2010, at 1:21 PM, "Doug Cutting (JIRA)" <jira@apache.org>

          Show
          eric baldeschwieler added a comment - Hi Doug, I do not support an incremental approach of protocol porting. The idea of supporting parallel impementations of our current hdfs protocols is mind boggling, as you've told me several times during Hadoop's development. All of this change would be a huge tax on folks trying to stabilize Hadoop for release. I agree doing this in a branch would be a huge job. This project is a huge job. That is my concern. It's not fair on the community to implicitly sign us all up to finish something of this scale. If you are signing up to do the project, great! We will help you close once you can demonstrate you are close to the finish line. I don't support throwing the whole community into a state where we need to finish this to ship good releases before anyone understands how we are going to finish it or who will finish it or how much work it will be. Thanks for taking the time to understand my concerns! — E14 - via iPhone On Apr 22, 2010, at 1:21 PM, "Doug Cutting (JIRA)" <jira@apache.org>
          Hide
          Arun C Murthy added a comment -

          Doug, the concern I have is that doing an incremental approach i.e. protocol-by-protocol, exposes us to the risk that a late-breaking bug will leave the projects (HDFS/Map-Reduce) in a state where we can't move forward or back since we have too many changes. I'm sure you see this.

          Having said that I do understand it's harder to this in a branch. However, you will find a lot of support to ease the pain - I would be willing to volunteer to help on the Map-Reduce IDL work in the branch, I'm sure we will find others willing to pitch in for HDFS etc. That way we could get the system working end-to-end with Avro, test at scale and merge switch swiftly. Does that sounds reasonable?

          Show
          Arun C Murthy added a comment - Doug, the concern I have is that doing an incremental approach i.e. protocol-by-protocol, exposes us to the risk that a late-breaking bug will leave the projects (HDFS/Map-Reduce) in a state where we can't move forward or back since we have too many changes. I'm sure you see this. Having said that I do understand it's harder to this in a branch. However, you will find a lot of support to ease the pain - I would be willing to volunteer to help on the Map-Reduce IDL work in the branch, I'm sure we will find others willing to pitch in for HDFS etc. That way we could get the system working end-to-end with Avro, test at scale and merge switch swiftly. Does that sounds reasonable?
          Hide
          Doug Cutting added a comment -

          > exposes us to the risk that a late-breaking bug will leave the projects (HDFS/Map-Reduce) in a state where we can't move forward or back

          We should test each change thoroughly before we commit it. We should not commit changes we feel are excessively risky and insufficiently tested. I do not propose committing any changes that are not agreed to be safe.

          Changing the serialization used for RPC is not a change in semantics, it's a change in syntax. As such, this change alone should not introduce any deep bugs. It may introduce NullPointerException-like bugs, but these are generally easy to fix. It should not introduce any new synchronization or other logic bugs. Thus I think the chance of a deep bug that goes undetected and will take a great investment to fix is minimal.

          If others disagree and wish to develop this as a branch they are welcome to do so, but I do not think we should freeze changes to protocols in trunk while they work in this branch.

          Show
          Doug Cutting added a comment - > exposes us to the risk that a late-breaking bug will leave the projects (HDFS/Map-Reduce) in a state where we can't move forward or back We should test each change thoroughly before we commit it. We should not commit changes we feel are excessively risky and insufficiently tested. I do not propose committing any changes that are not agreed to be safe. Changing the serialization used for RPC is not a change in semantics, it's a change in syntax. As such, this change alone should not introduce any deep bugs. It may introduce NullPointerException-like bugs, but these are generally easy to fix. It should not introduce any new synchronization or other logic bugs. Thus I think the chance of a deep bug that goes undetected and will take a great investment to fix is minimal. If others disagree and wish to develop this as a branch they are welcome to do so, but I do not think we should freeze changes to protocols in trunk while they work in this branch.
          Hide
          Doug Cutting added a comment -

          > I would be willing to volunteer to help on the Map-Reduce IDL work in the branch

          I am happy to kibitz on a branch. But I would only embrace a branched strategy if I felt I could dedicate myself full-time to that one effort together with similarly dedicated colleagues so that the branch time could be minimized. Unfortunately, I cannot dedicate myself full-time to such an endeavor. If others can, then they might pursue this, but I cannot in good faith, but would still be happy to assist how I can and will certainly not be an obstacle to such a plan.

          Show
          Doug Cutting added a comment - > I would be willing to volunteer to help on the Map-Reduce IDL work in the branch I am happy to kibitz on a branch. But I would only embrace a branched strategy if I felt I could dedicate myself full-time to that one effort together with similarly dedicated colleagues so that the branch time could be minimized. Unfortunately, I cannot dedicate myself full-time to such an endeavor. If others can, then they might pursue this, but I cannot in good faith, but would still be happy to assist how I can and will certainly not be an obstacle to such a plan.

            People

            • Assignee:
              Unassigned
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              49 Start watching this issue

              Dates

              • Created:
                Updated:

                Development