Hadoop Common
  1. Hadoop Common
  2. HADOOP-6904

A baby step towards inter-version RPC communications

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.23.0
    • Component/s: ipc
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed

      Description

      Currently RPC communications in Hadoop is very strict. If a client has a different version from that of the server, a VersionMismatched exception is thrown and the client can not connect to the server. This force us to update both client and server all at once if a RPC protocol is changed. But sometime different versions do not mean the client & server are not compatible. It would be nice if we could relax this restriction and allows us to support inter-version communications.

      My idea is that DfsClient catches VersionMismatched exception when it connects to NameNode. It then checks if the client & the server is compatible. If yes, it sets the NameNode version in the dfs client and allows the client to continue talking to NameNode. Otherwise, rethrow the VersionMismatch exception.

      1. rpcCompatible-trunk11.patch
        43 kB
        Hairong Kuang
      2. rpcCompatible-trunk10.patch
        43 kB
        Hairong Kuang
      3. rpcCompatible-trunk9.patch
        38 kB
        Hairong Kuang
      4. rpcCompatible-trunk8.patch
        38 kB
        Hairong Kuang
      5. rpcCompatible-trunk7.patch
        37 kB
        Hairong Kuang
      6. rpcCompatible-trunk6.patch
        37 kB
        Hairong Kuang
      7. rpcCompatible-trunk5.patch
        37 kB
        Hairong Kuang
      8. rpcCompatible-trunk4.patch
        36 kB
        Hairong Kuang
      9. rpcCompatible-trunk2.patch
        34 kB
        Hairong Kuang
      10. rpcCompatible-trunk1.patch
        29 kB
        Hairong Kuang
      11. rpcCompatible-trunk.patch
        22 kB
        Hairong Kuang
      12. majorMinorVersion1.patch
        19 kB
        Hairong Kuang
      13. majorMinorVersion.patch
        16 kB
        Hairong Kuang
      14. rpcVersion1.patch
        18 kB
        Hairong Kuang
      15. rpcVersion.patch
        17 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Luke Lu added a comment -

          VersionedProtocol keeps the old API: getProtocolVersion for backwards compatibility purpose.

          The patch added a new method getProtocolSignature to the VersionedProtocol interface, which breaks (compile and runtime) all existing code that uses VersionedProtocol.

          Looking at the code added to mapreduce protocols, it seems that the default getProtocolSignature impl doesn't need the interface change i.e., the API breakage doesn't seem necessary in default cases. For people who want to map a protocol to a different impl than the default. An additional static ProtocolSignature.bind(String protocol, Type impl) method can be used to establish the mapping at class init time.

          Show
          Luke Lu added a comment - VersionedProtocol keeps the old API: getProtocolVersion for backwards compatibility purpose. The patch added a new method getProtocolSignature to the VersionedProtocol interface, which breaks (compile and runtime) all existing code that uses VersionedProtocol. Looking at the code added to mapreduce protocols, it seems that the default getProtocolSignature impl doesn't need the interface change i.e., the API breakage doesn't seem necessary in default cases. For people who want to map a protocol to a different impl than the default. An additional static ProtocolSignature.bind(String protocol, Type impl) method can be used to establish the mapping at class init time.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #597 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/597/)
          MAPREDUCE-2300. Fix TestUmbilicalProtocolWithJobToken on trunk after HADOOP-6904. Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #597 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/597/ ) MAPREDUCE-2300 . Fix TestUmbilicalProtocolWithJobToken on trunk after HADOOP-6904 . Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #595 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/595/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #595 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/595/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #492 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/492/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #492 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/492/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #590 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/590/)
          HADOOP-6904. Support method based RPC compatiblity. Contributed by Hairong Kuang.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #590 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/590/ ) HADOOP-6904 . Support method based RPC compatiblity. Contributed by Hairong Kuang.
          Hide
          Konstantin Shvachko added a comment -

          You should have waited while common changes are integrated by Hudson, them commit hdfs and mr, as those are not compiling until them.

          Show
          Konstantin Shvachko added a comment - You should have waited while common changes are integrated by Hudson, them commit hdfs and mr, as those are not compiling until them.
          Hide
          Hairong Kuang added a comment -

          I've just committed this!

          Show
          Hairong Kuang added a comment - I've just committed this!
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12469706/rpcCompatible-trunk11.patch
          against trunk revision 1064403.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/206//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/206//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/206//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12469706/rpcCompatible-trunk11.patch against trunk revision 1064403. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/206//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/206//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/206//console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          This patch fixes the javac warnings.

          Show
          Hairong Kuang added a comment - This patch fixes the javac warnings.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12469591/rpcCompatible-trunk10.patch
          against trunk revision 1063613.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 1079 javac compiler warnings (more than the trunk's current 1049 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/202//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/202//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/202//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12469591/rpcCompatible-trunk10.patch against trunk revision 1063613. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1079 javac compiler warnings (more than the trunk's current 1049 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/202//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/202//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/202//console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          RpcCompatible-trunk10.patch has a few more changes. Most of then are done either to allow less code change to the rest of Hadoop code or make less disruptive to the current Hadoop depoyment.
          1. VersionedProtocol keeps the old API: getProtocolVersion for backwards compatibility purpose. I just found out that HBase RPC uses VersionedProtocol too. So keeping the old API would allow the old Hadoop RPC client to continue to talk to the new RPC server.
          2. Change PROTOCOL_FINGERPRINT_CACHE to be HashMap and instead make getSigFingerprint to be synchronized on the hashmap.
          3. Add a new static method in ProtocolSignature which all RPC servers could use:

            public static ProtocolSignature getProtocolSigature(VersionedProtocol server,
                String protocol,
                long clientVersion, int clientMethodsHash) throws IOException;
          

          4. keep the old getProxy/waitProxy APIs to allow less code change to the rest of the Hadoop code.

          Show
          Hairong Kuang added a comment - RpcCompatible-trunk10.patch has a few more changes. Most of then are done either to allow less code change to the rest of Hadoop code or make less disruptive to the current Hadoop depoyment. 1. VersionedProtocol keeps the old API: getProtocolVersion for backwards compatibility purpose. I just found out that HBase RPC uses VersionedProtocol too. So keeping the old API would allow the old Hadoop RPC client to continue to talk to the new RPC server. 2. Change PROTOCOL_FINGERPRINT_CACHE to be HashMap and instead make getSigFingerprint to be synchronized on the hashmap. 3. Add a new static method in ProtocolSignature which all RPC servers could use: public static ProtocolSignature getProtocolSigature(VersionedProtocol server, String protocol, long clientVersion, int clientMethodsHash) throws IOException; 4. keep the old getProxy/waitProxy APIs to allow less code change to the rest of the Hadoop code.
          Hide
          Hairong Kuang added a comment -

          sanjay, this is an optimization. If the client side methods hash is the same as the server side methods hash, the server does not send the list of methods back to the client.

          Show
          Hairong Kuang added a comment - sanjay, this is an optimization. If the client side methods hash is the same as the server side methods hash, the server does not send the list of methods back to the client.
          Hide
          Sanjay Radia added a comment -

          > it in addition sends the hashcode of client-side protocol methods;
          Hairong, what is the reason for sending the client-side methods to the server?
          Can't the client deal with the versioning problem all on its side?

          Show
          Sanjay Radia added a comment - > it in addition sends the hashcode of client-side protocol methods; Hairong, what is the reason for sending the client-side methods to the server? Can't the client deal with the versioning problem all on its side?
          Hide
          Doug Cutting added a comment -

          +1 This patch looks reasonable to me. Thanks, Hairong!

          Show
          Doug Cutting added a comment - +1 This patch looks reasonable to me. Thanks, Hairong!
          Hide
          Hairong Kuang added a comment -

          > Is there a reason why you don't just add the protocol's fingerprint as a field of ProtocolSignature
          Protocolsignature is shipped to the client side and the client does not need to know the server's finger print.

          But anyway I fixed the code so that the hash codes of the server protocol's methods do not get computed twice in case of a cache fault.

          Show
          Hairong Kuang added a comment - > Is there a reason why you don't just add the protocol's fingerprint as a field of ProtocolSignature Protocolsignature is shipped to the client side and the client does not need to know the server's finger print. But anyway I fixed the code so that the hash codes of the server protocol's methods do not get computed twice in case of a cache fault.
          Hide
          Doug Cutting added a comment -

          Is there a reason why you don't just add the protocol's fingerprint as a field of ProtocolSignature, instead defining a wrapper class? The methods are also still hashed twice in the cache fault case, but that's probably not a significant problem.

          Show
          Doug Cutting added a comment - Is there a reason why you don't just add the protocol's fingerprint as a field of ProtocolSignature, instead defining a wrapper class? The methods are also still hashed twice in the cache fault case, but that's probably not a significant problem.
          Hide
          Hairong Kuang added a comment -

          Ok this patch makes the protocol cache caches not only the protocol's finger print but also its signature.

          I still keep the class name as the key. Depending on implementation details to guarantee the class object is a singleton is risky. Besides, getting a class object's name is not expensive since it is cached.

          Show
          Hairong Kuang added a comment - Ok this patch makes the protocol cache caches not only the protocol's finger print but also its signature. I still keep the class name as the key. Depending on implementation details to guarantee the class object is a singleton is risky. Besides, getting a class object's name is not expensive since it is cached.
          Hide
          Doug Cutting added a comment -

          It now looks like every time ProtocolSignature.getProtocolSignature() is called with a mismatched client fingerprint it still calls Class#getMethods(), still rehashes each method, still creates two arrays, a new ProtocolSignatureInstance, etc.

          I believe that using Class as a hashtable key could work. With the default classloader there's only a single Class instance for a given class name, so using identity on Class instances should be equivalent to equality on class names. And if an application is using a non-default classloader then Class is perhaps a better hash key for a set of methods than class name.

          Could we add the protocol's fingerprint to ProtocolSignature, so that the cache can be Map<Class,ProtocolSignature>? I think this might also simplify the logic in ProtocolSignature.getProtocolSignature().

          Show
          Doug Cutting added a comment - It now looks like every time ProtocolSignature.getProtocolSignature() is called with a mismatched client fingerprint it still calls Class#getMethods(), still rehashes each method, still creates two arrays, a new ProtocolSignatureInstance, etc. I believe that using Class as a hashtable key could work. With the default classloader there's only a single Class instance for a given class name, so using identity on Class instances should be equivalent to equality on class names. And if an application is using a non-default classloader then Class is perhaps a better hash key for a set of methods than class name. Could we add the protocol's fingerprint to ProtocolSignature, so that the cache can be Map<Class,ProtocolSignature>? I think this might also simplify the logic in ProtocolSignature.getProtocolSignature().
          Hide
          Hairong Kuang added a comment -

          rpcCompatible-trunk7.patch has the following fixes:
          1. rewrite ProtocolSignature#getProtocolSingature to avoid the possible unnecessary call to getMethods.
          2. use java autoboxing and unboxing whenever possible.

          I still keep the protocol fingerprint cache as it is. Class does not have its own hashCode method, so it uses the default that is based on the Class object reference. Using it as the key does not work. Changing the value to be ProtocolSignature also does not work because ProtocolSignature does not contain the protocol's fingerprint.

          Show
          Hairong Kuang added a comment - rpcCompatible-trunk7.patch has the following fixes: 1. rewrite ProtocolSignature#getProtocolSingature to avoid the possible unnecessary call to getMethods. 2. use java autoboxing and unboxing whenever possible. I still keep the protocol fingerprint cache as it is. Class does not have its own hashCode method, so it uses the default that is based on the Class object reference. Using it as the key does not work. Changing the value to be ProtocolSignature also does not work because ProtocolSignature does not contain the protocol's fingerprint.
          Hide
          Doug Cutting added a comment -

          This is looking good!

          The cache might better be a Map<Class,ProtocolSignature>. Then you could avoid calling Class#getMethods(), 'new ProtocolSignature()', etc. for most calls to ProtocolSignature.getProtocolSignature(). You might add a ProtocolSignature(Class) constructor, and change the other constructor to either ProtocolSignature(int, int[]) or perhaps just ProtocolSignature(int), since null is always passed as the second parameter.

          Also, a very minor nit, do you need to call Integer.valueOf() and/or intValue()? Wouldn't autoboxing do the right thing there?

          Show
          Doug Cutting added a comment - This is looking good! The cache might better be a Map<Class,ProtocolSignature>. Then you could avoid calling Class#getMethods(), 'new ProtocolSignature()', etc. for most calls to ProtocolSignature.getProtocolSignature(). You might add a ProtocolSignature(Class) constructor, and change the other constructor to either ProtocolSignature(int, int[]) or perhaps just ProtocolSignature(int), since null is always passed as the second parameter. Also, a very minor nit, do you need to call Integer.valueOf() and/or intValue()? Wouldn't autoboxing do the right thing there?
          Hide
          Hairong Kuang added a comment -

          rpcCompatible-trunk6.patch makes the hashcode cache final and to be a concurrent hashmap.

          Show
          Hairong Kuang added a comment - rpcCompatible-trunk6.patch makes the hashcode cache final and to be a concurrent hashmap.
          Hide
          Hairong Kuang added a comment -

          > not to verify/choose it
          I meant not to verify/use it.

          Show
          Hairong Kuang added a comment - > not to verify/choose it I meant not to verify/use it.
          Hide
          Hairong Kuang added a comment -

          > If the client must provide an interface name that the server then uses to identify the set of methods to fingerprint.
          Client must provide an interface name but the server may choose not to verify/choose it. This decision simply makes the server implementation more flexible. getProtocolSignature is called only once when the client connects to the server. I do not think we should scarifies flexibility for performance.

          Look at the unit test TestRPCCompatibility#TestImpl0 that I provide. It does not use the client provided protocol name.

              @Override
              public ProtocolSignature getProtocolSigature(String protocol,
                  long clientVersion, int clientMethodsHashCode)
              throws IOException {
                Class<? extends VersionedProtocol> inter;
                try {
                  inter = (Class<? extends VersionedProtocol>)getClass().getGenericInterfaces()[0];
                } catch (Exception e) {
                  throw new IOException(e);
                }
                return ProtocolSignature.getProtocolSignature(clientMethodsHashCode, getVersion(), inter);
              }
          
          Show
          Hairong Kuang added a comment - > If the client must provide an interface name that the server then uses to identify the set of methods to fingerprint. Client must provide an interface name but the server may choose not to verify/choose it. This decision simply makes the server implementation more flexible. getProtocolSignature is called only once when the client connects to the server. I do not think we should scarifies flexibility for performance. Look at the unit test TestRPCCompatibility#TestImpl0 that I provide. It does not use the client provided protocol name. @Override public ProtocolSignature getProtocolSigature( String protocol, long clientVersion, int clientMethodsHashCode) throws IOException { Class <? extends VersionedProtocol> inter; try { inter = ( Class <? extends VersionedProtocol>)getClass().getGenericInterfaces()[0]; } catch (Exception e) { throw new IOException(e); } return ProtocolSignature.getProtocolSignature(clientMethodsHashCode, getVersion(), inter); }
          Hide
          Doug Cutting added a comment -

          If the client must provide an interface name that the server then uses to identify the set of methods to fingerprint then I don't see that adding the interface name to the fingerprint function would change semantics. Or am I missing something? It might even simplify some things, e.g., if the client's protocol fingerprint incorporated its protocol interface name then the fingerprint alone might be used to lookup cached server fingerprints. Are there protocols where we don't intend to use service authentication, where clients don't provide the protocol interface name? If so, that might argue against using the interface name in the protocol fingerprint. Are there other reasons?

          In the latest patch, we might better use ConcurrentHashMap for the fingerprint cache. Also, we should probably mark the cache 'static final' and name it something like FINGERPRINT_CACHE.

          Show
          Doug Cutting added a comment - If the client must provide an interface name that the server then uses to identify the set of methods to fingerprint then I don't see that adding the interface name to the fingerprint function would change semantics. Or am I missing something? It might even simplify some things, e.g., if the client's protocol fingerprint incorporated its protocol interface name then the fingerprint alone might be used to lookup cached server fingerprints. Are there protocols where we don't intend to use service authentication, where clients don't provide the protocol interface name? If so, that might argue against using the interface name in the protocol fingerprint. Are there other reasons? In the latest patch, we might better use ConcurrentHashMap for the fingerprint cache. Also, we should probably mark the cache 'static final' and name it something like FINGERPRINT_CACHE.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467934/rpcCompatible-trunk5.patch
          against trunk revision 1056006.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/164//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/164//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/164//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467934/rpcCompatible-trunk5.patch against trunk revision 1056006. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/164//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/164//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/164//console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          rpcCompatible-trunk5.patch incorporated Doug's review comments:
          1. Fix the bug in hashCode(Method) and add a unit test for this bug;
          2. Change all the method name hashCode to getFingerprint (but not the javadoc);
          3. Add a cache that maps a protocol to its hashcode to optimize the repeated calculations of the protcol's fingerprint.

          Show
          Hairong Kuang added a comment - rpcCompatible-trunk5.patch incorporated Doug's review comments: 1. Fix the bug in hashCode(Method) and add a unit test for this bug; 2. Change all the method name hashCode to getFingerprint (but not the javadoc); 3. Add a cache that maps a protocol to its hashcode to optimize the repeated calculations of the protcol's fingerprint.
          Hide
          dhruba borthakur added a comment -

          > Should the hashing consider just those methods in the interface named by the client?

          i would vote "yes" to the above question.

          Show
          dhruba borthakur added a comment - > Should the hashing consider just those methods in the interface named by the client? i would vote "yes" to the above question.
          Hide
          Hairong Kuang added a comment -

          > As for including the protocol name, ...
          Normally a server implementation should hash only those methods in the interface named by the client.

          But in general, a server could consider compatible if if it uses a protocol with a different name that has a method with the same signature.

          So I think not to include declaring class name as part of method hash is a general solution.

          Show
          Hairong Kuang added a comment - > As for including the protocol name, ... Normally a server implementation should hash only those methods in the interface named by the client. But in general, a server could consider compatible if if it uses a protocol with a different name that has a method with the same signature. So I think not to include declaring class name as part of method hash is a general solution.
          Hide
          Doug Cutting added a comment -

          Looks like there's a bug in hashCode(Method) where the method name is ignored. It might be good to add a test for that: same return type and parameters, but different name.

          It also might create less confusion if these methods were not called hashCode(), but rather getSignature() or getFingerprint(). According to wikipedia, this use is as a fingerprint. http://en.wikipedia.org/wiki/Hash_function

          As for including the protocol name, service authorization requires client protocol names to be sent. Servers often implement multiple protocols, a superset of client's protocol interface. Should the hashing consider just those methods in the interface named by the client? In general, if one uses a protocol with a different name that has a method with the same signature, do we want that to be considered compatible, or only when the server implements the protocol the client indicates?

          We might also need to cache protocol hash values. In the current patch they're recomputed for each proxy instance created and, on the server, for each client that connects. That computation may be significant.

          Show
          Doug Cutting added a comment - Looks like there's a bug in hashCode(Method) where the method name is ignored. It might be good to add a test for that: same return type and parameters, but different name. It also might create less confusion if these methods were not called hashCode(), but rather getSignature() or getFingerprint(). According to wikipedia, this use is as a fingerprint. http://en.wikipedia.org/wiki/Hash_function As for including the protocol name, service authorization requires client protocol names to be sent. Servers often implement multiple protocols, a superset of client's protocol interface. Should the hashing consider just those methods in the interface named by the client? In general, if one uses a protocol with a different name that has a method with the same signature, do we want that to be considered compatible, or only when the server implements the protocol the client indicates? We might also need to cache protocol hash values. In the current patch they're recomputed for each proxy instance created and, on the server, for each client that connects. That computation may be significant.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467905/rpcCompatible-trunk4.patch
          against trunk revision 1056006.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/163//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/163//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/163//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467905/rpcCompatible-trunk4.patch against trunk revision 1056006. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/163//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/163//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/163//console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          This patch in addition does
          1. Make sure hashcode of an array of methods is irrelavant of the methods order in the array.
          2. A method's hashcode also includes its returning type.
          3. Add method hashcode unit test.

          Show
          Hairong Kuang added a comment - This patch in addition does 1. Make sure hashcode of an array of methods is irrelavant of the methods order in the array. 2. A method's hashcode also includes its returning type. 3. Add method hashcode unit test.
          Hide
          Hairong Kuang added a comment -

          Doug, super good point! SerialVersionUID is a good option, but I do not like the fact that it also hashes class name and interface class names. In our use case, what matters the most is the method signatures.

          Show
          Hairong Kuang added a comment - Doug, super good point! SerialVersionUID is a good option, but I do not like the fact that it also hashes class name and interface class names. In our use case, what matters the most is the method signatures.
          Hide
          Doug Cutting added a comment -

          I just remembered that Class#getMethods() does not return methods in any particular order, and I believe that Arrays#hashCode() is order-dependent. So the methods need to be sorted by name before they are hashed.

          Note that Java Serialization defines a good hashing method as the default definition for serialVersionUID:

          http://download.oracle.com/javase/6/docs/platform/serialization/spec/class.html#4100

          This can be accessed with ObjectStreamClass.lookup(iface).getSerialVersionUID(). Unfortunately the protocol has to extend java.io.Serializeable, and we'd need to check that explicitly, as this method returns 0 for non-Serializeable interfaces, but this still might be a good way to get a well designed hash-function for interfaces.

          Show
          Doug Cutting added a comment - I just remembered that Class#getMethods() does not return methods in any particular order, and I believe that Arrays#hashCode() is order-dependent. So the methods need to be sorted by name before they are hashed. Note that Java Serialization defines a good hashing method as the default definition for serialVersionUID: http://download.oracle.com/javase/6/docs/platform/serialization/spec/class.html#4100 This can be accessed with ObjectStreamClass.lookup(iface).getSerialVersionUID(). Unfortunately the protocol has to extend java.io.Serializeable, and we'd need to check that explicitly, as this method returns 0 for non-Serializeable interfaces, but this still might be a good way to get a well designed hash-function for interfaces.
          Hide
          Owen O'Malley added a comment -

          With 93 versions of the protocol there is either a 0% or 100% chance of collision. Probabilities of collisions only matter with an open set of versions. Since protocol versions are by definition a closed set, we only care if there is a collision. In this case, there will likely be a closed set of 1 to 5 versions. 32 bits is fine for such a purpose.

          Show
          Owen O'Malley added a comment - With 93 versions of the protocol there is either a 0% or 100% chance of collision. Probabilities of collisions only matter with an open set of versions. Since protocol versions are by definition a closed set, we only care if there is a collision. In this case, there will likely be a closed set of 1 to 5 versions. 32 bits is fine for such a purpose.
          Hide
          Doug Cutting added a comment -

          About hash sizes: Java hashcodes are designed for hash tables, a collision-friendly purpose. This is not a collision-friendly purpose. A collision in protocols hashes would cause confusing failures.

          Looking at:

          http://en.wikipedia.org/wiki/Birthday_problem#Probability_table

          With 32-bit hashses, if there are 93 versions of a protocol, then there's a 1 in a million chance (p=10^-6) of a collision, assuming a very good hash function. Those are probably acceptable odds: a protocol might have 93 variations, but there probably won't be anywhere near a million protocols. If we used 64-bit hashes, then we could have millions of versions of millions of protocols before there's a decent chance of a collision. That's a lot better. In Avro we use 128-bit MD5 hashes to identify protocols, which is probably overkil. So I'm okay with 32-bit hashes here, but I'd be happier with 64. A 64-bit string hash is easy to write:

          http://stackoverflow.com/questions/1660501/what-is-a-good-64bit-hash-function-in-java-for-textual-strings

          Show
          Doug Cutting added a comment - About hash sizes: Java hashcodes are designed for hash tables, a collision-friendly purpose. This is not a collision-friendly purpose. A collision in protocols hashes would cause confusing failures. Looking at: http://en.wikipedia.org/wiki/Birthday_problem#Probability_table With 32-bit hashses, if there are 93 versions of a protocol, then there's a 1 in a million chance (p=10^-6) of a collision, assuming a very good hash function. Those are probably acceptable odds: a protocol might have 93 variations, but there probably won't be anywhere near a million protocols. If we used 64-bit hashes, then we could have millions of versions of millions of protocols before there's a decent chance of a collision. That's a lot better. In Avro we use 128-bit MD5 hashes to identify protocols, which is probably overkil. So I'm okay with 32-bit hashes here, but I'd be happier with 64. A 64-bit string hash is easy to write: http://stackoverflow.com/questions/1660501/what-is-a-good-64bit-hash-function-in-java-for-textual-strings
          Hide
          Hairong Kuang added a comment -

          rpcCompatible-trunk2.patch moves ProtocolProxy and ProtocolSignature to be top-level class.

          Show
          Hairong Kuang added a comment - rpcCompatible-trunk2.patch moves ProtocolProxy and ProtocolSignature to be top-level class.
          Hide
          Hairong Kuang added a comment -

          Thanks Doug for your review. The new patch rpcCompatible-trunk1.patch addressed your and Dhruba's comments.

          Good catch on Method hashcode calculation! I fixed it in the new patch and changed the unit test to make sure that overloading methods have different hashcodes.

          I also incorporated other comments except for a hashcode of longs because the java standard is to have hashcode of type integer.Let me know If you insisted on longs.

          This patch makes changes to some unit tests due to the interface change of getProxy.

          I will also load this patch to review board.

          Show
          Hairong Kuang added a comment - Thanks Doug for your review. The new patch rpcCompatible-trunk1.patch addressed your and Dhruba's comments. Good catch on Method hashcode calculation! I fixed it in the new patch and changed the unit test to make sure that overloading methods have different hashcodes. I also incorporated other comments except for a hashcode of longs because the java standard is to have hashcode of type integer.Let me know If you insisted on longs. This patch makes changes to some unit tests due to the interface change of getProxy. I will also load this patch to review board.
          Hide
          Doug Cutting added a comment -

          Looks good. Some comments on the current patch:

          • Method#hashCode() hashes the method's declaring class and the method name. Rather we probably want the hash of the method name (since we only send that on the wire) and the names of the types of its parameters. I guess we require that if someone changes the serialization of a parameter that they'll manually increment the protocol version number?
          • longs might be better to use than ints for method hashcodes, to better prevent collisions
          • can ProtocolServer become ProtocolServer<T> where T is the proxied interface?
          • most of the implementation of TestRPCCompatiblity.testImpl0#getProtocolVersion might be a public static utility method, like, RPC.getProtocolSignature(Class<?> interface).
          • The terms ProtocolVersion and ProtocolInfo are both used for the same concept. We might rename ProtocolInfo to ProtocolSignature and getProtocolVersion to getProtocolSignature?
          Show
          Doug Cutting added a comment - Looks good. Some comments on the current patch: Method#hashCode() hashes the method's declaring class and the method name. Rather we probably want the hash of the method name (since we only send that on the wire) and the names of the types of its parameters. I guess we require that if someone changes the serialization of a parameter that they'll manually increment the protocol version number? longs might be better to use than ints for method hashcodes, to better prevent collisions can ProtocolServer become ProtocolServer<T> where T is the proxied interface? most of the implementation of TestRPCCompatiblity.testImpl0#getProtocolVersion might be a public static utility method, like, RPC.getProtocolSignature(Class<?> interface). The terms ProtocolVersion and ProtocolInfo are both used for the same concept. We might rename ProtocolInfo to ProtocolSignature and getProtocolVersion to getProtocolSignature?
          Hide
          Hairong Kuang added a comment -
          Show
          Hairong Kuang added a comment - Review board: https://reviews.apache.org/r/224/ .
          Hide
          Hairong Kuang added a comment -

          Ok, I finally got some time to work on this. Here comes the first draft.
          1. getProtocolVersion has a new signature:
          a. it in addition sends the hashcode of client-side protocol methods;
          b. it returns the server's version number and an array of hash codes of server protocol's methods if the client protocol is different from the server side protocol.
          2. getProxy returns a new type ProtocolServer which contains
          a. the proxy to communicate with server;
          b. a method to check if a client side method is supported at the server side or not.
          3. A unit test TestRPCCompatibility to illustrate how to write compatible RPCs.

          Show
          Hairong Kuang added a comment - Ok, I finally got some time to work on this. Here comes the first draft. 1. getProtocolVersion has a new signature: a. it in addition sends the hashcode of client-side protocol methods; b. it returns the server's version number and an array of hash codes of server protocol's methods if the client protocol is different from the server side protocol. 2. getProxy returns a new type ProtocolServer which contains a. the proxy to communicate with server; b. a method to check if a client side method is supported at the server side or not. 3. A unit test TestRPCCompatibility to illustrate how to write compatible RPCs.
          Hide
          Doug Cutting added a comment -

          > Can we use a hash of client side and server side to optimize getting the method names?

          The description of how Avro handles this is at:

          http://avro.apache.org/docs/current/spec.html#handshake

          Also, if there is no longer a major/minor distinction, perhaps it should just be called the 'version'?

          Show
          Doug Cutting added a comment - > Can we use a hash of client side and server side to optimize getting the method names? The description of how Avro handles this is at: http://avro.apache.org/docs/current/spec.html#handshake Also, if there is no longer a major/minor distinction, perhaps it should just be called the 'version'?
          Hide
          Sanjay Radia added a comment -

          Looks like we are getting closure. Only thing left is how to manage the cost of sending the method names. Can we cache it across connects? Can we use a hash of
          client side and server side to optimize getting the method names?

          Show
          Sanjay Radia added a comment - Looks like we are getting closure. Only thing left is how to manage the cost of sending the method names. Can we cache it across connects? Can we use a hash of client side and server side to optimize getting the method names?
          Hide
          dhruba borthakur added a comment -

          >Dhruba, did you buy the use of the Major number along with method names?

          Yes

          > My current thought is that Major number changes when you delete a method, change the signature of the method, of the serialization of the method.

          +1

          Show
          dhruba borthakur added a comment - >Dhruba, did you buy the use of the Major number along with method names? Yes > My current thought is that Major number changes when you delete a method, change the signature of the method, of the serialization of the method. +1
          Hide
          Hairong Kuang added a comment -

          > The client side does not know what is newly added or deleted.
          I meant the upper layer like dfs client makes the check for newly added/deleted ones. The upper layer may choose not to do the check as you suggested. The RPC layer only provides the set of supported method names.

          Show
          Hairong Kuang added a comment - > The client side does not know what is newly added or deleted. I meant the upper layer like dfs client makes the check for newly added/deleted ones. The upper layer may choose not to do the check as you suggested. The RPC layer only provides the set of supported method names.
          Hide
          Sanjay Radia added a comment -

          The client side does not know what is newly added or deleted. All it has is client and server side list of methods. The client also knows that the
          major number has not changed (ie getProxy succeeded)-- ie the calls that work should be called.

          • For the optimized methods, an upper layer (say Hdfs or DfsClient) will check to see if optimal method is available and if not it will call the non-optimized method;
            for this one has to consult the list of server methods. This matches the case with the previous M-m# approach where one compared the minor number to see if it supports the optimized method (see my list example above).
          • For the rest of the methods , simply make the call; consulting the list of server methods is an optimization for the failure case. Recall that for the previous M-m# approach
            we did NOT compare the m# but simply make the call.

          (Dhruba, did you buy the use of the Major number along with method names?)

          Show
          Sanjay Radia added a comment - The client side does not know what is newly added or deleted. All it has is client and server side list of methods. The client also knows that the major number has not changed (ie getProxy succeeded)-- ie the calls that work should be called. For the optimized methods, an upper layer (say Hdfs or DfsClient) will check to see if optimal method is available and if not it will call the non-optimized method; for this one has to consult the list of server methods. This matches the case with the previous M-m# approach where one compared the minor number to see if it supports the optimized method (see my list example above). For the rest of the methods , simply make the call; consulting the list of server methods is an optimization for the failure case. Recall that for the previous M-m# approach we did NOT compare the m# but simply make the call. (Dhruba, did you buy the use of the Major number along with method names?)
          Hide
          Hairong Kuang added a comment -

          > we have to do it for each call.
          I meant for each newly added/deleted ones.

          Show
          Hairong Kuang added a comment - > we have to do it for each call. I meant for each newly added/deleted ones.
          Hide
          Hairong Kuang added a comment -

          > Exactly when will the method names be consulted? On each call or only when a optimizedFoo is being called instead of the regular foo?

          If we assume that not all new methods are the optimized ones, we have to do it for each call, right?

          Show
          Hairong Kuang added a comment - > Exactly when will the method names be consulted? On each call or only when a optimizedFoo is being called instead of the regular foo? If we assume that not all new methods are the optimized ones, we have to do it for each call, right?
          Hide
          Sanjay Radia added a comment -

          > ... addition of an all-new method name .... the only permitted, compatible kind of change to a protocol.
          Yes.

          Show
          Sanjay Radia added a comment - > ... addition of an all-new method name .... the only permitted, compatible kind of change to a protocol. Yes.
          Hide
          Sanjay Radia added a comment -

          Normally the deletion of a method is considered a break in compatibility. Doug, I hadn't thought about the selective changing of major number as you suggest. That could work.

          So we are coming to the conclusion that one can do major number and method names.
          Exactly when will the method names be consulted? On each call or only when a optimizedFoo is being called instead of the regular foo?

          Show
          Sanjay Radia added a comment - Normally the deletion of a method is considered a break in compatibility. Doug, I hadn't thought about the selective changing of major number as you suggest. That could work. So we are coming to the conclusion that one can do major number and method names. Exactly when will the method names be consulted? On each call or only when a optimizedFoo is being called instead of the regular foo?
          Hide
          Doug Cutting added a comment -

          > Major number changes when you delete a method, change the signature of the method, of the serialization of the method.

          Perhaps it would be simpler to state when it doesn't change. Are you suggesting that the "major" version changes for any protocol change but the addition of an all-new method name? That's the only permitted, compatible kind of change to a protocol? That seems reasonable.

          Note that, if a method is removed, one can decide between incrementing the version or generating a runtime error when it's called on a case-by-case basis. For a obsolete core method that should no longer be called by compatible clients, incrementing the major would be appropriate. For a seldom-called method that can no longer be supported, one might leave the version alone and generate a runtime error to permit old clients to still operate in the majority of cases.

          Show
          Doug Cutting added a comment - > Major number changes when you delete a method, change the signature of the method, of the serialization of the method. Perhaps it would be simpler to state when it doesn't change. Are you suggesting that the "major" version changes for any protocol change but the addition of an all-new method name? That's the only permitted, compatible kind of change to a protocol? That seems reasonable. Note that, if a method is removed, one can decide between incrementing the version or generating a runtime error when it's called on a case-by-case basis. For a obsolete core method that should no longer be called by compatible clients, incrementing the major would be appropriate. For a seldom-called method that can no longer be supported, one might leave the version alone and generate a runtime error to permit old clients to still operate in the majority of cases.
          Hide
          Sanjay Radia added a comment -

          > .. But adding or removing a method would not require a change of this [major #] value.
          In the original Major-minor proposal a deletion of a method does require a change in Major#.

          • Use case 1. Remove method foo() because we think it is not needed and we are willing to break client apps.
            Old clients connect and realize that the Major# has changed and disconnect. Old applications cannot run against new server. We detect this, not when foo() is called, but whent
            getProxy() is called. If the server has broken compatibility I want clean failure rather then a partial failure when the missing method is called.
          • Use case 2. Service has a method bar and we add method fasterBar(). getProxy succeeds because it is a compatible change.
            New client side checks, at the time of the method call, to see if fasterBar is supported. If not then calls bar instead.

          For both use cases there is a mismatch of the list of methods. But in case 2 it is okay (bar is fall-back for fasterBar).
          In use case 1 it is not okay. I guess one could argue that we should let the app run and if it uses the removed method then, and only then, fail. I am uncomfortable about that but and willing to be convinced. The point is that the two approaches are NOT equivalent.

          My current thought is that Major number changes when you delete a method, change the signature of the method, of the serialization of the method.

          Show
          Sanjay Radia added a comment - > .. But adding or removing a method would not require a change of this [major #] value. In the original Major-minor proposal a deletion of a method does require a change in Major#. Use case 1. Remove method foo() because we think it is not needed and we are willing to break client apps. Old clients connect and realize that the Major# has changed and disconnect. Old applications cannot run against new server. We detect this, not when foo() is called, but whent getProxy() is called. If the server has broken compatibility I want clean failure rather then a partial failure when the missing method is called. Use case 2. Service has a method bar and we add method fasterBar(). getProxy succeeds because it is a compatible change. New client side checks, at the time of the method call, to see if fasterBar is supported. If not then calls bar instead. For both use cases there is a mismatch of the list of methods. But in case 2 it is okay (bar is fall-back for fasterBar). In use case 1 it is not okay. I guess one could argue that we should let the app run and if it uses the removed method then, and only then, fail. I am uncomfortable about that but and willing to be convinced. The point is that the two approaches are NOT equivalent. My current thought is that Major number changes when you delete a method, change the signature of the method, of the serialization of the method.
          Hide
          Hairong Kuang added a comment -

          Doug, yes you are right. Using hash may not work well if a new method is added with a new type of parameters. Declaring it manual should give us a better control of compatibility.

          So we vote for the method names solution, right?

          Show
          Hairong Kuang added a comment - Doug, yes you are right. Using hash may not work well if a new method is added with a new type of parameters. Declaring it manual should give us a better control of compatibility. So we vote for the method names solution, right?
          Hide
          Doug Cutting added a comment -

          Hairong> Major numbers should be updated if we change the (de)serialization format.

          Okay, I think I get it. The "major version" is updated whenever a parameter list changes or the serialization of a parameter changes. But adding or removing a method would not require a change of this value. If that's right, then perhaps we should call it something like "format version"? With Avro, this could be the hash of the protocol, but with Writable it would have to be the hash of the read and write method bytecodes of each parameter, which would be too precise, so we instead declare it manually. Do I have this right? Thanks!

          Show
          Doug Cutting added a comment - Hairong> Major numbers should be updated if we change the (de)serialization format. Okay, I think I get it. The "major version" is updated whenever a parameter list changes or the serialization of a parameter changes. But adding or removing a method would not require a change of this value. If that's right, then perhaps we should call it something like "format version"? With Avro, this could be the hash of the protocol, but with Writable it would have to be the hash of the read and write method bytecodes of each parameter, which would be too precise, so we instead declare it manually. Do I have this right? Thanks!
          Hide
          Hairong Kuang added a comment -

          Major numbers should be updated if we change the (de)serialization format.

          Show
          Hairong Kuang added a comment - Major numbers should be updated if we change the (de)serialization format.
          Hide
          Doug Cutting added a comment -

          If we have method names, why would major numbers be needed at all? If method semantics change then the method should be renamed. What utility would major version numbers then have over a list of methods? If the issue is simply performance, then perhaps a hash of the method names could be sent instead of the full list?

          Show
          Doug Cutting added a comment - If we have method names, why would major numbers be needed at all? If method semantics change then the method should be renamed. What utility would major version numbers then have over a list of methods? If the issue is simply performance, then perhaps a hash of the method names could be sent instead of the full list?
          Hide
          Hairong Kuang added a comment -

          GetProtocolVersion will be sure to send major version number. This is for judging if client/server are compatible or not.

          We could either send minor version or a list of method names. From a programmer's point of view, using method names seems less error-prone.

          Show
          Hairong Kuang added a comment - GetProtocolVersion will be sure to send major version number. This is for judging if client/server are compatible or not. We could either send minor version or a list of method names. From a programmer's point of view, using method names seems less error-prone.
          Hide
          dhruba borthakur added a comment -

          > The real difference is the following: if you check during getProx() and a method is missing then how does one assert that it is still okay to connect because the missing method is actually a new method and the call can return a non-implemented exception;

          I like the idea of not checking anything during getProxy. Like you said, maybe a method is missing from the server, but it is also likely that the client is not even interested in making that call. Instead, we should check at every call (as you explained via the code piece in list()).

          Hairong seems to say that she will make the server send method names over and above M-m numbers, I am ok with that idea.

          Show
          dhruba borthakur added a comment - > The real difference is the following: if you check during getProx() and a method is missing then how does one assert that it is still okay to connect because the missing method is actually a new method and the call can return a non-implemented exception; I like the idea of not checking anything during getProxy. Like you said, maybe a method is missing from the server, but it is also likely that the client is not even interested in making that call. Instead, we should check at every call (as you explained via the code piece in list()). Hairong seems to say that she will make the server send method names over and above M-m numbers, I am ok with that idea.
          Hide
          Sanjay Radia added a comment -

          @Dhruba
          >major/minor numbers can be considered an optimization, isn't it?
          The M-m# allows one to assert compatibility; the other way is via method name comparison. I see them as complementary rather than
          as an optimization. Without the major minor number one has to do a check on each call or do a check on the getProxy. The real difference is the following: if you check during getProx() and a method is missing then how does one assert that it is still okay to connect because the missing method is actually a new method and the call can return a non-implemented exception; with M-m# one can assert this because when the M# has not changed and hence one is assured that the method was not deleted and it must be a new method.

          If you look at the example of list() call I gave above, the method names give a minor simplification of the code.
          I have no issues with passing the method names as long as we also optimize by not sending the list when there is match of version numbers. I also believe we will have to cache the results across reconnects (BTW we had discussed the same optimization for
          Avro RPC except that it uses the schema checksum).

          Show
          Sanjay Radia added a comment - @Dhruba >major/minor numbers can be considered an optimization, isn't it? The M-m# allows one to assert compatibility; the other way is via method name comparison. I see them as complementary rather than as an optimization. Without the major minor number one has to do a check on each call or do a check on the getProxy. The real difference is the following: if you check during getProx() and a method is missing then how does one assert that it is still okay to connect because the missing method is actually a new method and the call can return a non-implemented exception; with M-m# one can assert this because when the M# has not changed and hence one is assured that the method was not deleted and it must be a new method. If you look at the example of list() call I gave above, the method names give a minor simplification of the code. I have no issues with passing the method names as long as we also optimize by not sending the list when there is match of version numbers. I also believe we will have to cache the results across reconnects (BTW we had discussed the same optimization for Avro RPC except that it uses the schema checksum).
          Hide
          Hairong Kuang added a comment -

          If we assume that each rpc has a distinct name in a protocol, we could simply ship an array of method names together with the version number as a result of getProtocolVersion. At the client side, we could store the server-supported method names as a hashmap. The client could do a method name lookup before issuing a RPC to the server side.

          Show
          Hairong Kuang added a comment - If we assume that each rpc has a distinct name in a protocol, we could simply ship an array of method names together with the version number as a result of getProtocolVersion. At the client side, we could store the server-supported method names as a hashmap. The client could do a method name lookup before issuing a RPC to the server side.
          Hide
          dhruba borthakur added a comment -

          > Returning methods that supported at the server side along with getProtocolVersion is a good solution to support across branch client/server RPCs.
          > Dhruba is especially interested in this approach

          I really like the idea of making the server return a list of method signatures to the client. Then the client can dynamically decide how best to make the RPC. If we do that, then the need to have major/minor numbers can be considered an optimization, isn't it?

          Show
          dhruba borthakur added a comment - > Returning methods that supported at the server side along with getProtocolVersion is a good solution to support across branch client/server RPCs. > Dhruba is especially interested in this approach I really like the idea of making the server return a list of method signatures to the client. Then the client can dynamically decide how best to make the RPC. If we do that, then the need to have major/minor numbers can be considered an optimization, isn't it?
          Hide
          Sanjay Radia added a comment -

          This patch looks fine. Please update HDFS-1335 as Doug suggested.
          In HDFS-1335, how about starting the Major# of HDFS as 22 to signify the protocol as of Hadoop-22.
          After that we will only make minor changes till we move to Avro.

          Show
          Sanjay Radia added a comment - This patch looks fine. Please update HDFS-1335 as Doug suggested. In HDFS-1335 , how about starting the Major# of HDFS as 22 to signify the protocol as of Hadoop-22. After that we will only make minor changes till we move to Avro.
          Hide
          Sanjay Radia added a comment -

          @doug
          > .. protocol introspection on the client and server. That would better integrate with Avro, should we ever decide to switch to that.
          With Avro you get that for free.
          Originally I was in favor of adding the method inspection to this jira; but if you look at my comment above it does not make
          much of a difference for the client - the client needs an if statement either way. Most of the benefits come from the Major-minor number as
          it eliminates all the if statements related to the minor change to the protocol.
          I could live with doing method names later when we integrate with Avro (as Doug suggests).

          Show
          Sanjay Radia added a comment - @doug > .. protocol introspection on the client and server. That would better integrate with Avro, should we ever decide to switch to that. With Avro you get that for free. Originally I was in favor of adding the method inspection to this jira; but if you look at my comment above it does not make much of a difference for the client - the client needs an if statement either way. Most of the benefits come from the Major-minor number as it eliminates all the if statements related to the minor change to the protocol. I could live with doing method names later when we integrate with Avro (as Doug suggests).
          Hide
          Doug Cutting added a comment -

          Hairong, the latest patch seems like it could be reasonable, but it lacks context for me to evalutate it. Perhaps you could also update HDFS-1335 or maybe provide a test case that defines two versions of a simple protocol and shows how this would be used to adapt them?

          I like the idea that Sanjay presents of permitting protocol introspection on the client and server. That would better integrate with Avro, should we ever decide to switch to that.

          Show
          Doug Cutting added a comment - Hairong, the latest patch seems like it could be reasonable, but it lacks context for me to evalutate it. Perhaps you could also update HDFS-1335 or maybe provide a test case that defines two versions of a simple protocol and shows how this would be used to adapt them? I like the idea that Sanjay presents of permitting protocol introspection on the client and server. That would better integrate with Avro, should we ever decide to switch to that.
          Hide
          Hairong Kuang added a comment -

          A new patch is attached. This is a little bit different from Sanjay's summary:
          1. getProxy or waitProxy returns an object that contains the reference to the RPC proxy and the server version #. The caller of getProxy/waitProxy could cache the server version# by itself.
          2. getProtocolVersion will always get the server version from the server side.

          Returning methods that supported at the server side along with getProtocolVersion is a good solution to support across branch client/server RPCs. Dhruba is especially interested in this approach. Lets get back to this after we put more thoughts. For this jira, I prefer to go with the version# for now.

          Show
          Hairong Kuang added a comment - A new patch is attached. This is a little bit different from Sanjay's summary: 1. getProxy or waitProxy returns an object that contains the reference to the RPC proxy and the server version #. The caller of getProxy/waitProxy could cache the server version# by itself. 2. getProtocolVersion will always get the server version from the server side. Returning methods that supported at the server side along with getProtocolVersion is a good solution to support across branch client/server RPCs. Dhruba is especially interested in this approach. Lets get back to this after we put more thoughts. For this jira, I prefer to go with the version# for now.
          Hide
          Sanjay Radia added a comment -

          On further thought, the Major-minor removes most of the switch statements. Having the list of the server method names simplifies the if statement:

          With M-m:

            list() {
              if (m >= xxx)  {
                  dfsclient.iterateList(..)
             } else {
                 dfsclient.list(...)
            }
          

          With method names

            if (server.hasMethod(iterateListSignature) {
                  dfsclient.iterateList(..)
             } else {
                 dfsclient.list(...)
            }
          
          Show
          Sanjay Radia added a comment - On further thought, the Major-minor removes most of the switch statements. Having the list of the server method names simplifies the if statement: With M-m: list() { if (m >= xxx) { dfsclient.iterateList(..) } else { dfsclient.list(...) } With method names if (server.hasMethod(iterateListSignature) { dfsclient.iterateList(..) } else { dfsclient.list(...) }
          Hide
          Sanjay Radia added a comment -

          Hairong, to summarize our discussion today:
          GetProtocolVersion() could return the server method names; we can optimize this if the version numbers match exactly.
          This would allow the client side to determine what is supported on SS without having a messy switch statement on the V#.

          The main issue you raised is how do we return this to the caller of getProxy(). The two options we discussed were

          • when Proxy#getProtocolVersion() is called, the proxy can return the cached info that it obtained when getProtocolVersion() was first called.
          • let getProxy() return an object that contains the proxy object and also the M#, m# and the method names/signatures.
          Show
          Sanjay Radia added a comment - Hairong, to summarize our discussion today: GetProtocolVersion() could return the server method names; we can optimize this if the version numbers match exactly. This would allow the client side to determine what is supported on SS without having a messy switch statement on the V#. The main issue you raised is how do we return this to the caller of getProxy(). The two options we discussed were when Proxy#getProtocolVersion() is called, the proxy can return the cached info that it obtained when getProtocolVersion() was first called. let getProxy() return an object that contains the proxy object and also the M#, m# and the method names/signatures.
          Hide
          Hairong Kuang added a comment -

          This patch supports major/minor versions of a RPC protocol.

          Show
          Hairong Kuang added a comment - This patch supports major/minor versions of a RPC protocol.
          Hide
          Sanjay Radia added a comment -

          >But for the trunk we've already had so many incompatible changes, we might just go for an incompatible solution.
          You are right - trunk has broken compatibility with previous versions and this is too hard to fix now.
          Thus we can add the major minor right now in trunk so that we can use it going forward (till avoro/pb is added to rpc).
          Ie leave the old getProtocolVersion in and it will return the major number (and deprecate the method)
          and add a new method to return major and minor.

          Show
          Sanjay Radia added a comment - >But for the trunk we've already had so many incompatible changes, we might just go for an incompatible solution. You are right - trunk has broken compatibility with previous versions and this is too hard to fix now. Thus we can add the major minor right now in trunk so that we can use it going forward (till avoro/pb is added to rpc). Ie leave the old getProtocolVersion in and it will return the major number (and deprecate the method) and add a new method to return major and minor.
          Hide
          Hairong Kuang added a comment -

          Yes, I agree that with some efforts we can make the major/minor idea backward compatible.

          But for the trunk we've already had so many incompatible changes, we might just go for an incompatible solution.

          Show
          Hairong Kuang added a comment - Yes, I agree that with some efforts we can make the major/minor idea backward compatible. But for the trunk we've already had so many incompatible changes, we might just go for an incompatible solution.
          Hide
          Sanjay Radia added a comment -

          > Would we be able to use the major and minor version number to automate this.
          hairong, you had mentioned that unfortunately doing this will not backwards compatible.
          How about adding a new method called getMajorMinorVersion()
          If that method fails you fall back to the getProtocolVersion()
          This means going forward we have a slightly simpler model for dealing with compatibility.
          (Of course when Avro is inserted then we are not likely to need the version numbers at all.)

          Show
          Sanjay Radia added a comment - > Would we be able to use the major and minor version number to automate this. hairong, you had mentioned that unfortunately doing this will not backwards compatible. How about adding a new method called getMajorMinorVersion() If that method fails you fall back to the getProtocolVersion() This means going forward we have a slightly simpler model for dealing with compatibility. (Of course when Avro is inserted then we are not likely to need the version numbers at all.)
          Hide
          Hairong Kuang added a comment -

          > Would we be able to use the major and minor version number to automate this.

          I like this idea!

          Show
          Hairong Kuang added a comment - > Would we be able to use the major and minor version number to automate this. I like this idea!
          Hide
          Sanjay Radia added a comment -

          >Yes, it is going to be a hard coded compatibility list.
          Is it something in the dfs-client or as a config info that the rpc layer uses?

          Would we be able to use the major and minor version number to automate this.
          ie if major numbers mismath then incompatible otherwise compatible.

          Clearly the special calls to call older less optimized methods when talking to an old server have to be hard coded in the client side.

          Show
          Sanjay Radia added a comment - >Yes, it is going to be a hard coded compatibility list. Is it something in the dfs-client or as a config info that the rpc layer uses? Would we be able to use the major and minor version number to automate this. ie if major numbers mismath then incompatible otherwise compatible. Clearly the special calls to call older less optimized methods when talking to an old server have to be hard coded in the client side.
          Hide
          Sanjay Radia added a comment -

          > But the new client will get the "now such method" exception and will know that he is talking to an older client.
          I meant - But the new client will get the "now such method" exception and will know that he is talking to an older server.

          Show
          Sanjay Radia added a comment - > But the new client will get the "now such method" exception and will know that he is talking to an older client. I meant - But the new client will get the "now such method" exception and will know that he is talking to an older server.
          Hide
          Hairong Kuang added a comment -

          > Slight improvement to your design:
          > If a new server knows that it is compatible with an older client then it should accept the client's version.
          > If the client is new then the decision can only be made on the client side.

          Sanjay, this is exactly what I did in my patch to HDFS-1335. Here is the algorithm for getProtocolVersion:

          If client is old and server is new
          server checks compatibility. if not compatible, throws VersionIncompatibleException. Connection fails.
          Otherwise, return server version; RPC client will throw a VersionMismatch. DFS client catches VersionMismatch but ignores it because server is newer. Connection succeeds.

          if client is newer and server is old
          server simply returns the server version;
          RPC client throws VersionMismatch. DFS client catches VersionMismatch and then performs a compatibility check; If not compatible, throws VersionIncompatible and connection fails. Otherwise, connection succeeds.

          Show
          Hairong Kuang added a comment - > Slight improvement to your design: > If a new server knows that it is compatible with an older client then it should accept the client's version. > If the client is new then the decision can only be made on the client side. Sanjay, this is exactly what I did in my patch to HDFS-1335 . Here is the algorithm for getProtocolVersion: If client is old and server is new server checks compatibility. if not compatible, throws VersionIncompatibleException. Connection fails. Otherwise, return server version; RPC client will throw a VersionMismatch. DFS client catches VersionMismatch but ignores it because server is newer. Connection succeeds. if client is newer and server is old server simply returns the server version; RPC client throws VersionMismatch. DFS client catches VersionMismatch and then performs a compatibility check; If not compatible, throws VersionIncompatible and connection fails. Otherwise, connection succeeds.
          Hide
          Hairong Kuang added a comment -

          > But the new client will get the "now such method" exception and will know that he is talking to an older client.
          Sanjay I think you meant the old client will get the "no such method" exception and will know he is talking to an older server.

          Yes, this adds one more RPC and does not seem to be elegant. Furthermore,

          If a protocol has a method x, if you add an improved version of x, called method y, to the protocol. Later on add method an improved version of y, called method z, a protocol. Both changes do not bump version #. When a client failed on calling Z, it does not know it should fail back to y or z.

          I think bumping up version number is good, clearly indicating a change are made to a protocol. It should be the application itself to decide how to deal with the change in a compatible way.

          Show
          Hairong Kuang added a comment - > But the new client will get the "now such method" exception and will know that he is talking to an older client. Sanjay I think you meant the old client will get the "no such method" exception and will know he is talking to an older server. Yes, this adds one more RPC and does not seem to be elegant. Furthermore, If a protocol has a method x, if you add an improved version of x, called method y, to the protocol. Later on add method an improved version of y, called method z, a protocol. Both changes do not bump version #. When a client failed on calling Z, it does not know it should fail back to y or z. I think bumping up version number is good, clearly indicating a change are made to a protocol. It should be the application itself to decide how to deal with the change in a compatible way.
          Hide
          Sanjay Radia added a comment -

          >If we do not bump the version number, old client can continue to talk to a new server, but we also want a new client to talk to an old server.
          But the new client will get the "now such method" exception and will know that he is talking to an older client.

          (BTW I am not at this point suggesting we modify our practice of changing version #s (yet).
          For example with Avro or PBs, if you call a old server with a new method you simple get back a "not such method" exception.
          Correct Doug?

          Show
          Sanjay Radia added a comment - >If we do not bump the version number, old client can continue to talk to a new server, but we also want a new client to talk to an old server. But the new client will get the "now such method" exception and will know that he is talking to an older client. (BTW I am not at this point suggesting we modify our practice of changing version #s (yet). For example with Avro or PBs, if you call a old server with a new method you simple get back a "not such method" exception. Correct Doug?
          Hide
          Sanjay Radia added a comment -

          Slight improvement to your design:
          If a new server knows that it is compatible with an older client then it should accept the client's version.
          If the client is new then the decision can only be made on the client side.

          Show
          Sanjay Radia added a comment - Slight improvement to your design: If a new server knows that it is compatible with an older client then it should accept the client's version. If the client is new then the decision can only be made on the client side.
          Hide
          Hairong Kuang added a comment -

          This framework supports both backward & forward compatibility.

          I agree with both you and Doug, this does not exclude Avro. For changes like adding a parameter to a RPC, Avro makes it much simpler.

          Show
          Hairong Kuang added a comment - This framework supports both backward & forward compatibility. I agree with both you and Doug, this does not exclude Avro. For changes like adding a parameter to a RPC, Avro makes it much simpler.
          Hide
          Hairong Kuang added a comment -

          Yes, Sanjay, you got the idea! I am happy to see your thoughtful ideas here.

          > If the changes were compatible then why did we bump the version number in the first place.

          If we do not bump the version number, old client can continue to talk to a new server, but we also want a new client to talk to an old server.

          Show
          Hairong Kuang added a comment - Yes, Sanjay, you got the idea! I am happy to see your thoughtful ideas here. > If the changes were compatible then why did we bump the version number in the first place. If we do not bump the version number, old client can continue to talk to a new server, but we also want a new client to talk to an old server.
          Hide
          Hairong Kuang added a comment -

          As a practice, I rewrote HDFS-946, improve listStatus performance, in a backward compatibility way. So the idea is that if the NN is an old server, the client sends the old RPC getListing or getFileInfo, which send back FileStatus in full path name. If NN is a new server, the client send the new RPCs, getHdfsLising or getHdfsFileInfo, which send back HdfsFileStatus that carries only a local name in byte format.

          Potentially HDFS-202 could also written in an backward compatible way. If new server, client sends a request getting FileStatus with locations back together; Otherwise, client first get FileStatus then sends separate RPC for block location.

          Show
          Hairong Kuang added a comment - As a practice, I rewrote HDFS-946 , improve listStatus performance, in a backward compatibility way. So the idea is that if the NN is an old server, the client sends the old RPC getListing or getFileInfo, which send back FileStatus in full path name. If NN is a new server, the client send the new RPCs, getHdfsLising or getHdfsFileInfo, which send back HdfsFileStatus that carries only a local name in byte format. Potentially HDFS-202 could also written in an backward compatible way. If new server, client sends a request getting FileStatus with locations back together; Otherwise, client first get FileStatus then sends separate RPC for block location.
          Hide
          Sanjay Radia added a comment -

          Sorry I hit <add> by mistake.
          For example will be be able to handle addition of new methods. (I don't believe adding new parameters work).
          What would be interesting is to loo at the 20 protocol and compare it to the trunck protocol and determine which
          of those changes are compatible.

          This also bags the the questions. If the changes were compatible then why did we bump the version number in the first place.
          Is it because we have a simple rule: "any changes bump the version number"?

          Longer term the I agree with Doug that using a stronger serialization mechanism like Avro is part of the answer.

          Show
          Sanjay Radia added a comment - Sorry I hit <add> by mistake. For example will be be able to handle addition of new methods. (I don't believe adding new parameters work). What would be interesting is to loo at the 20 protocol and compare it to the trunck protocol and determine which of those changes are compatible. This also bags the the questions. If the changes were compatible then why did we bump the version number in the first place. Is it because we have a simple rule: "any changes bump the version number"? Longer term the I agree with Doug that using a stronger serialization mechanism like Avro is part of the answer.
          Hide
          Sanjay Radia added a comment -

          >Could you provide a few examples of compatible changes to ClientProtocol?
          Hairong I have the same question as Nicolas. Can you give a couple of concrete examples of changes to the dfs client
          protocol but it continues to remain compatible.

          Show
          Sanjay Radia added a comment - >Could you provide a few examples of compatible changes to ClientProtocol? Hairong I have the same question as Nicolas. Can you give a couple of concrete examples of changes to the dfs client protocol but it continues to remain compatible.
          Hide
          Doug Cutting added a comment -

          I'll repeat here my comments from HDFS-1335: This patch seems fine: it's simple and does not alone introduce significant risk.

          However subsequent changes to ProtocolCompatible will need to be carefully reviewed and tested. We don't yet have a testing framework to do that. So we're delaying that work (interoperability testing of different versions) until we actually have two versions that are meant to interoperate. This is a technical debt will be due in full the first time ProtocolCompatible is altered.

          Show
          Doug Cutting added a comment - I'll repeat here my comments from HDFS-1335 : This patch seems fine: it's simple and does not alone introduce significant risk. However subsequent changes to ProtocolCompatible will need to be carefully reviewed and tested. We don't yet have a testing framework to do that. So we're delaying that work (interoperability testing of different versions) until we actually have two versions that are meant to interoperate. This is a technical debt will be due in full the first time ProtocolCompatible is altered.
          Hide
          Hairong Kuang added a comment -

          Erik, I believe that there are many solutions to version compatible problem. I would be very happy to see a proposal from you taking the capabilities approach.

          As the title indicates, my proposal is a baby step towards the solution. The idea is simple and needs least amount of change to the existing code.

          Show
          Hairong Kuang added a comment - Erik, I believe that there are many solutions to version compatible problem. I would be very happy to see a proposal from you taking the capabilities approach. As the title indicates, my proposal is a baby step towards the solution. The idea is simple and needs least amount of change to the existing code.
          Hide
          Erik Steffl added a comment -

          Was it ever considered to take the capabilities approach instead of version checking?

          See e.g. IMAP protocol for example (chose it as an example because it's a widely used protocol that has number of client and server implementations working successfully with each other), this looks like a fairly comprehensive link to protocol specs http://www.networksorcery.com/enp/protocol/imap.htm

          Seems like as Hadoop matures it could make sense to provide a method for both client and server to negotiate how they talk to each other (and possibly adjust to each other). At this point the many implementation scenario is not valid yet but capabilities are useful anyway (cross version) and (very likely in my opinion) sooner or later we will get different implementations (happens with pretty much any successfull non-proprietary client/server system (even some proprietary ones)).

          Show
          Erik Steffl added a comment - Was it ever considered to take the capabilities approach instead of version checking? See e.g. IMAP protocol for example (chose it as an example because it's a widely used protocol that has number of client and server implementations working successfully with each other), this looks like a fairly comprehensive link to protocol specs http://www.networksorcery.com/enp/protocol/imap.htm Seems like as Hadoop matures it could make sense to provide a method for both client and server to negotiate how they talk to each other (and possibly adjust to each other). At this point the many implementation scenario is not valid yet but capabilities are useful anyway (cross version) and (very likely in my opinion) sooner or later we will get different implementations (happens with pretty much any successfull non-proprietary client/server system (even some proprietary ones)).
          Hide
          Hairong Kuang added a comment -

          Doug, now that the patch to HDFS-1335 is out there, do you see any issue with this approach? Today I rewrote HDFS-946 to make it backward compatible and tested on my local machine. It worked out very well.

          Show
          Hairong Kuang added a comment - Doug, now that the patch to HDFS-1335 is out there, do you see any issue with this approach? Today I rewrote HDFS-946 to make it backward compatible and tested on my local machine. It worked out very well.
          Hide
          Doug Cutting added a comment -

          I'd like to have a better understanding of how this will be used (HDFS-1335) before it is committed.

          Show
          Doug Cutting added a comment - I'd like to have a better understanding of how this will be used ( HDFS-1335 ) before it is committed.
          Hide
          Hairong Kuang added a comment -

          Hudson is stuck. I ran tests on my local box. Here are the results:

          ant test-patch
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to i
          [exec] nclude 3 new or modified tests.
          [exec]
          [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Javadoc warnings are not related to this patch.

          ant test:
          BUILD SUCCESSFUL
          Total time: 10 minutes 20 seconds

          Show
          Hairong Kuang added a comment - Hudson is stuck. I ran tests on my local box. Here are the results: ant test-patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to i [exec] nclude 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Javadoc warnings are not related to this patch. ant test: BUILD SUCCESSFUL Total time: 10 minutes 20 seconds
          Hide
          Hairong Kuang added a comment -

          This patch changed a comment.

          Show
          Hairong Kuang added a comment - This patch changed a comment.
          Hide
          Doug Cutting added a comment -

          I think the long-term plan is to try to avoid version incompatibilities. With Avro-based RPC (HADOOP-6659) we could:

          • handle the addition of new parameters and fields with reasonable default values
          • handle changes to method semantics by adding new methods with the new semantics (eventually perhaps removing the old methods).

          We'd still need to be careful to design and test for back-compatibility, and only, e.g., make sure new parameters and fields have good defaults, add new methods when semantics change, and only remove an old method in a major release where we're willing to lose some back-compatibility. This approach might not need an explicit check of which versions are compatible, since Avro could detect the missing method.

          Unfortunately work on HADOOP-6659 has been stopped by a veto.

          The approach you propose here could be useful until work on HADOOP-6659 is resumed, and may be even useful afterwards, as it might permit more concise incompatibility errors. It's not a very large or complex change, so I don't see it creating a lot of risk or adding significant maintenance costs.

          Show
          Doug Cutting added a comment - I think the long-term plan is to try to avoid version incompatibilities. With Avro-based RPC ( HADOOP-6659 ) we could: handle the addition of new parameters and fields with reasonable default values handle changes to method semantics by adding new methods with the new semantics (eventually perhaps removing the old methods). We'd still need to be careful to design and test for back-compatibility, and only, e.g., make sure new parameters and fields have good defaults, add new methods when semantics change, and only remove an old method in a major release where we're willing to lose some back-compatibility. This approach might not need an explicit check of which versions are compatible, since Avro could detect the missing method. Unfortunately work on HADOOP-6659 has been stopped by a veto. The approach you propose here could be useful until work on HADOOP-6659 is resumed, and may be even useful afterwards, as it might permit more concise incompatibility errors. It's not a very large or complex change, so I don't see it creating a lot of risk or adding significant maintenance costs.
          Hide
          Hairong Kuang added a comment -

          Here is a patch for review.

          1. It introduces a new exception VersionIncompatible means that the server is not able to serve the client because they are not compatible, while VersionMismatch only means that the client and server have different versions and the server could not determine if the client is compatible with the server or not.

          2. It adds a set of unit tests to illustrate how client & server can talk across-versions.

          Other changes are trivial.

          Show
          Hairong Kuang added a comment - Here is a patch for review. 1. It introduces a new exception VersionIncompatible means that the server is not able to serve the client because they are not compatible, while VersionMismatch only means that the client and server have different versions and the server could not determine if the client is compatible with the server or not. 2. It adds a set of unit tests to illustrate how client & server can talk across-versions. Other changes are trivial.
          Hide
          Hairong Kuang added a comment -

          Looks that all the changes I need to make is in ipc. So I move this jira from hdfs to common.

          Show
          Hairong Kuang added a comment - Looks that all the changes I need to make is in ipc. So I move this jira from hdfs to common.
          Hide
          Hairong Kuang added a comment -

          Yes, it is going to be a hard coded compatibility list. The goal of this jira is not to do anything intelligent for automatic compatibility detection. Instead it provides a starting point for our developers to make across-version changes to ClientProtocol. In many company's setups, it is very hard to upgrade both the client & server at the same time. It's nice if HDFS could support rolling upgrade.

          The more I think about it, it is not only the client but also the server should participate in the version compatibility checking.Suppose the client is old but the server is new. The client does not know the new server but the new server should be able to tell if the client is allowed to talk to the new server or not. Whoever is newer should be the one that does the compatibility check. So VersionProtocol#getProtocolVersion should do a compatibility check if the server is newer than the client.

          Show
          Hairong Kuang added a comment - Yes, it is going to be a hard coded compatibility list. The goal of this jira is not to do anything intelligent for automatic compatibility detection. Instead it provides a starting point for our developers to make across-version changes to ClientProtocol. In many company's setups, it is very hard to upgrade both the client & server at the same time. It's nice if HDFS could support rolling upgrade. The more I think about it, it is not only the client but also the server should participate in the version compatibility checking.Suppose the client is old but the server is new. The client does not know the new server but the new server should be able to tell if the client is allowed to talk to the new server or not. Whoever is newer should be the one that does the compatibility check. So VersionProtocol#getProtocolVersion should do a compatibility check if the server is newer than the client.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Could you provide a few examples of compatible changes to ClientProtocol? I wonder how could a DfsClient check whether it is compatible to the server. Are there rules? or just by a hard coded compatibility list?

          The description is quite abstract. I do not understand what are the goals. Could you tell us your use cases?

          Show
          Tsz Wo Nicholas Sze added a comment - Could you provide a few examples of compatible changes to ClientProtocol? I wonder how could a DfsClient check whether it is compatible to the server. Are there rules? or just by a hard coded compatibility list? The description is quite abstract. I do not understand what are the goals. Could you tell us your use cases?
          Hide
          Hairong Kuang added a comment -

          If version X client can communicate with version Y server, then X and Y are compatible.

          Right now, isCompatible should return false. But later one, if anybody adds a compatible change to ClientProtocol, s/he could modify isCompatible method to reflect this change.

          Show
          Hairong Kuang added a comment - If version X client can communicate with version Y server, then X and Y are compatible. Right now, isCompatible should return false. But later one, if anybody adds a compatible change to ClientProtocol, s/he could modify isCompatible method to reflect this change.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Would you please first define "compatible"?

          Show
          Tsz Wo Nicholas Sze added a comment - Would you please first define "compatible"?

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development