Hadoop Common
  1. Hadoop Common
  2. HADOOP-6949

Reduces RPC packet size for primitive arrays, especially long[], which is used at block reporting

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0, 0.23.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
      Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.
      Show
      Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5. Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.
    • Tags:
      Writable, primitive arrays, RPC

      Description

      Current implementation of oah.io.ObjectWritable marshals primitive array types as general object array ; array type string + array length + (element type string + value)*n

      It would not be needed to specify each element types for primitive arrays.

      1. ObjectWritable.diff
        5 kB
        Navis
      2. arrayprim.patch
        20 kB
        Matt Foley
      3. arrayprim_v4.patch
        23 kB
        Matt Foley
      4. arrayprim_v5.patch
        24 kB
        Matt Foley
      5. arrayprim_v6.patch
        24 kB
        Matt Foley
      6. arrayprim_v7.patch
        24 kB
        Matt Foley

        Issue Links

          Activity

          Navis created issue -
          Navis made changes -
          Field Original Value New Value
          Attachment ObjectWritable.diff [ 12454281 ]
          Hide
          Todd Lipcon added a comment -

          Hi Navis,

          Please format your change as a "patch" file that can be applied against the trunk of the hadoop-common repository. This one has a diff of 1.txt and 2.txt which makes it more difficult to apply.

          Regarding the change, it seems good except that it's backwards-incompatible for ObjectWritable. Since ObjectWritable is a public facing class, it's likely that some people have stored them on disk (eg in sequencefiles) so I don't think we can make a breaking change like this.

          Do you have benchmarks that show a significant improvement by making this change? If so, it may be worth creating a new CompactObjectWritable that's used only in the RPC layer, for example, and marked as a Private Evolving interface. If the improvement is not significant, we should skip this change since we'll eventually move to a more compact RPC format (Avro) anyway.

          Show
          Todd Lipcon added a comment - Hi Navis, Please format your change as a "patch" file that can be applied against the trunk of the hadoop-common repository. This one has a diff of 1.txt and 2.txt which makes it more difficult to apply. Regarding the change, it seems good except that it's backwards-incompatible for ObjectWritable. Since ObjectWritable is a public facing class, it's likely that some people have stored them on disk (eg in sequencefiles) so I don't think we can make a breaking change like this. Do you have benchmarks that show a significant improvement by making this change? If so, it may be worth creating a new CompactObjectWritable that's used only in the RPC layer, for example, and marked as a Private Evolving interface. If the improvement is not significant, we should skip this change since we'll eventually move to a more compact RPC format (Avro) anyway.
          Hide
          Navis added a comment -

          You're right. It's just an mere aesthetic mod, but a real improvement.

          It was the smallest part of in-house NN-HA implementaion. For DN having couple of million blocks, it reduced block reporting time from 21sec --> 17 sec.

          I'll resolve this issue as a 'Not-A-Problem'. Thanks for your comment.

          Show
          Navis added a comment - You're right. It's just an mere aesthetic mod, but a real improvement. It was the smallest part of in-house NN-HA implementaion. For DN having couple of million blocks, it reduced block reporting time from 21sec --> 17 sec. I'll resolve this issue as a 'Not-A-Problem'. Thanks for your comment.
          Navis made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Not A Problem [ 8 ]
          Hide
          Konstantin Shvachko added a comment -

          This seems to have higher impact on performance than previously reported. I see many VersionProtocol methods use arrays as their input or output parameters, which can substantially benefit from this optimization.
          See discussion started in HDFS-1583.

          Show
          Konstantin Shvachko added a comment - This seems to have higher impact on performance than previously reported. I see many VersionProtocol methods use arrays as their input or output parameters, which can substantially benefit from this optimization. See discussion started in HDFS-1583 .
          Konstantin Shvachko made changes -
          Resolution Not A Problem [ 8 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Konstantin Shvachko added a comment -

          From comment
          > move ObjectWritable's array i/o to protected methods, add a subclass with an optimized
          > implementation of these methods, then use that subclass in RPC in place of ObjectWritable.

          Sounds like a plan to me.

          There is a patch for arrays with primitive types attached to this jira. Can be used as a starting point I guess.

          Show
          Konstantin Shvachko added a comment - From comment > move ObjectWritable's array i/o to protected methods, add a subclass with an optimized > implementation of these methods, then use that subclass in RPC in place of ObjectWritable. Sounds like a plan to me. There is a patch for arrays with primitive types attached to this jira. Can be used as a starting point I guess.
          Konstantin Shvachko made changes -
          Link This issue blocks HDFS-1583 [ HDFS-1583 ]
          Hide
          dhruba borthakur added a comment -

          +1, this sounds like a good thing to do (for performance)

          Show
          dhruba borthakur added a comment - +1, this sounds like a good thing to do (for performance)
          Hide
          Konstantin Shvachko added a comment -

          Should be huge impact on blockReport(), and journal() edits streaming to BN as to Liyin's benchmarks.

          Show
          Konstantin Shvachko added a comment - Should be huge impact on blockReport(), and journal() edits streaming to BN as to Liyin's benchmarks.
          Hide
          Matt Foley added a comment -

          The attached patch is for consideration and discussion, not immediate submission. It is a little more complex than the prior suggestion, but has the following benefits:
          1. it is fully backward compatible with any persisted data that may be out there, because it can still successfully receive/read the old format
          2. it has minimal changes to the ObjectWritable module
          3. it is consistent with current practice regarding *Writable classes
          4. It automatically benefits all code that is currently pumping arrays of longs or arrays of bytes through the RPC and ObjectWritable mechanisms, including block reports and journaling.
          5. It has fully optimized wire format, with almost no object overhead, unlike solutions based on ArrayWritable.

          That said, it may still be considered an incompatible change:
          While it is fully backward compatible with old persisted data, it is not backward compatible with older versions of Client software: A NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an OLDER RECEIVER will not correctly receive arrays of primitives sent from a NEWER SENDER.

          To achieve that level of backward compatibility requires some sort of version negotiation scheme for the RPC, such as remembering whether the sender used the new or old format when sending to you. I'm not aware of much support for that level of complexity.

          What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change?

          Show
          Matt Foley added a comment - The attached patch is for consideration and discussion, not immediate submission. It is a little more complex than the prior suggestion, but has the following benefits: 1. it is fully backward compatible with any persisted data that may be out there, because it can still successfully receive/read the old format 2. it has minimal changes to the ObjectWritable module 3. it is consistent with current practice regarding *Writable classes 4. It automatically benefits all code that is currently pumping arrays of longs or arrays of bytes through the RPC and ObjectWritable mechanisms, including block reports and journaling. 5. It has fully optimized wire format, with almost no object overhead, unlike solutions based on ArrayWritable. That said, it may still be considered an incompatible change: While it is fully backward compatible with old persisted data, it is not backward compatible with older versions of Client software: A NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an OLDER RECEIVER will not correctly receive arrays of primitives sent from a NEWER SENDER. To achieve that level of backward compatibility requires some sort of version negotiation scheme for the RPC, such as remembering whether the sender used the new or old format when sending to you. I'm not aware of much support for that level of complexity. What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change?
          Matt Foley made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Assignee Matt Foley [ mattf ]
          Fix Version/s 0.23.0 [ 12315569 ]
          Tags Writable, primitive arrays, RPC
          Matt Foley made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matt Foley made changes -
          Attachment arrayprim.patch [ 12469054 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change?

          We allow incompatible change for major releases. So it is acceptable for 0.23.

          Show
          Tsz Wo Nicholas Sze added a comment - > What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change? We allow incompatible change for major releases. So it is acceptable for 0.23.
          Tsz Wo Nicholas Sze made changes -
          Hadoop Flags [Incompatible change]
          Priority Trivial [ 5 ] Major [ 3 ]
          Hide
          Todd Lipcon added a comment -

          A few comments:

          • in readFields/write - add an else { throw new IllegalStateException("no branch for type: " + componentType); }

            or something just to be safe?

          • For the functions that loop around the arrays, it might be faster to cast the Object to a foo[] rather than using Array.setFoo/Array.getFoo - might get some extra bounds check elimination going on and doesn't seem like any lost clarity in doing so
          • For the special case of reading and writing byte[] we can use in.readFully() and out.write() with the whole array, which might be a bit faster.
          • Should we consider using variable length coding for some of these types, since we have the opportunity to change the format? Could even do something fancy like delta-code it and then vint-encode for more space savings? Would be worth a benchmark to see if it helps.
          Show
          Todd Lipcon added a comment - A few comments: in readFields/write - add an else { throw new IllegalStateException("no branch for type: " + componentType); } or something just to be safe? For the functions that loop around the arrays, it might be faster to cast the Object to a foo[] rather than using Array.setFoo/Array.getFoo - might get some extra bounds check elimination going on and doesn't seem like any lost clarity in doing so For the special case of reading and writing byte[] we can use in.readFully() and out.write() with the whole array, which might be a bit faster. Should we consider using variable length coding for some of these types, since we have the opportunity to change the format? Could even do something fancy like delta-code it and then vint-encode for more space savings? Would be worth a benchmark to see if it helps.
          Hide
          Matt Foley added a comment -

          Hi Todd, thanks for reading over this. Agree with the first three items, and will include in next version of the patch. I'll wait a couple days to see if other suggestions arise.

          The fourth item is tempting, and I had considered running long arrays through a zip filter, which achieves much the same goal. However, I'm told by folks who've been around longer than I that the wire format was kept in "clear" on purpose for simplicity and ease of maintenance. So I think we should leave it that way, unless there is overwhelming support for a compressed format.

          Show
          Matt Foley added a comment - Hi Todd, thanks for reading over this. Agree with the first three items, and will include in next version of the patch. I'll wait a couple days to see if other suggestions arise. The fourth item is tempting, and I had considered running long arrays through a zip filter, which achieves much the same goal. However, I'm told by folks who've been around longer than I that the wire format was kept in "clear" on purpose for simplicity and ease of maintenance. So I think we should leave it that way, unless there is overwhelming support for a compressed format.
          Hide
          Doug Cutting added a comment -

          We don't currently support cross-version RPC, so changing the RPC wire format is certainly acceptable. My concern and the reason I suggested adding a subclass of ObjectWritable used by RPC that writes the new format is that folks might be using ObjectWritable for file-based data. In this case files written by the new version would not be readable by the old version. Some sites run different versions of Hadoop on different clusters and use distcp/HFTP to sync sets of files between the clusters. If these files contained ObjectWritable then this could make files sync'd from a cluster running a newer version unreadable on a cluster running an older version. So the safest change would be to only change RPC and not ObjectWritable.

          Show
          Doug Cutting added a comment - We don't currently support cross-version RPC, so changing the RPC wire format is certainly acceptable. My concern and the reason I suggested adding a subclass of ObjectWritable used by RPC that writes the new format is that folks might be using ObjectWritable for file-based data. In this case files written by the new version would not be readable by the old version. Some sites run different versions of Hadoop on different clusters and use distcp/HFTP to sync sets of files between the clusters. If these files contained ObjectWritable then this could make files sync'd from a cluster running a newer version unreadable on a cluster running an older version. So the safest change would be to only change RPC and not ObjectWritable.
          Hide
          Matt Foley added a comment -

          Doug, thanks for bringing up this important use case. After looking over the code, it appears there is a single line in WritableRpcEngine where all RPC code calls ObjectWritable.writeObject(). If this one call is modified to allow the more compact array format, we get the benefit for all RPC protocols based on WritableRpcEngine, without having to change any of the RPC APIs, and with no danger to the other callers of ObjectWritable.writeObject(). Any other callers of ObjectWritable.writeObject() may choose to use the new call, but for safety sake we default to the old format.

          Please take a look at this new patch. Besides the changes Todd recommended, I broke ObjectWritable.writeObject() into a stub with the old API signature, and a slightly modified implementation with an additional boolean argument controlling whether compact format is used for arrays of primitives. If false, the behavior is as before; if true the behavior is the new compact format. Callers of the old stub always get the old ("false") behavior. And finally I changed the one line in WritableRpcEngine to call the new API with the boolean argument "true".

          I also re-confirmed that this change gives the improvement in Block Report processing RPC overhead time.

          Show
          Matt Foley added a comment - Doug, thanks for bringing up this important use case. After looking over the code, it appears there is a single line in WritableRpcEngine where all RPC code calls ObjectWritable.writeObject(). If this one call is modified to allow the more compact array format, we get the benefit for all RPC protocols based on WritableRpcEngine, without having to change any of the RPC APIs, and with no danger to the other callers of ObjectWritable.writeObject(). Any other callers of ObjectWritable.writeObject() may choose to use the new call, but for safety sake we default to the old format. Please take a look at this new patch. Besides the changes Todd recommended, I broke ObjectWritable.writeObject() into a stub with the old API signature, and a slightly modified implementation with an additional boolean argument controlling whether compact format is used for arrays of primitives. If false, the behavior is as before; if true the behavior is the new compact format. Callers of the old stub always get the old ("false") behavior. And finally I changed the one line in WritableRpcEngine to call the new API with the boolean argument "true". I also re-confirmed that this change gives the improvement in Block Report processing RPC overhead time.
          Matt Foley made changes -
          Attachment arrayprim_v4.patch [ 12469796 ]
          Hide
          Doug Cutting added a comment -

          Matt, this looks like a great way to address the compatibility issue. +1

          Show
          Doug Cutting added a comment - Matt, this looks like a great way to address the compatibility issue. +1
          Hide
          Hairong Kuang added a comment -

          Cool optimization! With this, shall we bump up rpc version number?

          Show
          Hairong Kuang added a comment - Cool optimization! With this, shall we bump up rpc version number?
          Hide
          Doug Cutting added a comment -

          Hairong, good point: this should probably change the RPC version number.

          Show
          Doug Cutting added a comment - Hairong, good point: this should probably change the RPC version number.
          Hide
          Konstantin Shvachko added a comment -

          The compatible way of doing this is really nice. A couple of questions.

          1. Can/should we extend this to arrays of non-primitive types? This should benefit return types for calls like listStatus() and getBlockLocations() on a large directory.
          2. Is it enough to increment rpc version only? This changes serialization of each and every VersionedProtocol. So formally they all should be incremented.
          3. This seems to be a very important optimization. Would like to raise the question of applying it to 0.22.
          Show
          Konstantin Shvachko added a comment - The compatible way of doing this is really nice. A couple of questions. Can/should we extend this to arrays of non-primitive types? This should benefit return types for calls like listStatus() and getBlockLocations() on a large directory. Is it enough to increment rpc version only? This changes serialization of each and every VersionedProtocol. So formally they all should be incremented. This seems to be a very important optimization. Would like to raise the question of applying it to 0.22.
          Hide
          Matt Foley added a comment -

          Regarding VersionedProtocols: By searching for all implementations of the VersionedProtocol API "getProtocolVersion(String, long)", and the values they return, I found the following 16 version constants:
          hdfs.protocol.ClientDatanodeProtocol.versionID; (8)
          hdfs.protocol.ClientProtocol.versionID; (65)
          hdfs.server.protocol.DatanodeProtocol.versionID; (27)
          hdfs.server.protocol.InterDatanodeProtocol.versionID; (6)
          hdfs.server.protocol.NamenodeProtocol.versionID; (5)
          mapred.AdminOperationsProtocol.versionID; (3)
          mapred.InterTrackerProtocol.versionID; (31)
          mapred.TaskUmbilicalProtocol.versionID; (19)
          mapreduce.protocol.ClientProtocol.versionID; (36)

          hadoop.security.authorize.RefreshAuthorizationPolicyProtocol.versionID; (1)
          hadoop.security.RefreshUserMappingsProtocol.versionID; (1)
          hadoop.ipc.AvroRpcEngine.VERSION; (0)

          TEST:
          hadoop.security.TestDoAsEffectiveUser.TestProtocol.versionID (1)
          hadoop.ipc.MiniRPCBenchmark.MiniProtocol.versionID; (1)
          hadoop.ipc.TestRPC.TestProtocol.versionID; (1)
          mapred.TestTaskCommit.MyUmbilical unnamed constant (0)

          The first nine are clearly production version numbers. The next three (two security and one avro) do not seem to have ever been incremented and I wonder if they need to be now. The last four are test-specific and I think should not be incremented. So please advise:

          1. Do all the first nine protocols use WritableRPCEngine and therefore need to have their version numbers incremented?
          2. Do the next three need to have their version numbers incremented for this change?
          3. Do you agree that the four Test protocol versions should not change?
          4. Did I miss any that you are aware of?

          Thank you. I will put together the versioning patch when we have consensus on what to change.

          Konstantin, regarding your suggestion to extend this enhancement to arrays of non-Primitive objects:
          There is a simple way to extend this approach to arrays of Strings and Writables. I coded it up and have it available, it adds an ArrayWritable.Internal very similar to ArrayPrimitiveWritable.Internal. Nice and clean. Of course the size improvement isn't as dramatic since the ratio of label size to object size isn't as bad as with primitives, but the performance improvement is still there (not having to go through the decision loop for every element of a long array).

          However, using it would cause a significant change in semantics:
          The current ObjectWritable can handle non-homogeneous arrays containing different kinds of Writables, and nulls.
          The optimization we are discussing here is removing the type-tagging of every array entry, thereby assuming that the array is in fact strictly homogeneous, including having no null entries. There is also the question of what type declaration the container array has on entry, and what type it should have on exit. In the current code the only restriction on array type is
          (Writable[]).type isAssignableFrom (X[]).type and
          X.type isAssignableFrom x[i].getClass() for all elements i
          Also in the current code, the array produced on the receiving side is always simply Writable[].

          Questions:
          1. Is it acceptable to assume/mandate that all arrays of Writables passed in to RPC shall be homogeneous and have no null elements? Note that this is a very strict form of homogeneity, forbidding even subclass instances, because, for example, if you define a class FooWritable and a subclass SubFooWritable, you can put them both in an array of declared type FooWritable[], but the receiving-side deserializer will ONLY produce objects of type FooWritable, and will fail entirely unless the serialized output of FooWritable.write() and SubFooWritable.write() happen to be compatible (which is too complicated an exception to try to explain, IMO).

          2. On the receiving side, should we continue producing an array of type Writable[], or (ii) preserve the type of the array during the implicit encoding process, or (iii) produce an array of componentType same as the actual element type, assuming the array is indeed strictly homogeneous so all elements are the same type? All three are easily done, but have implications about what can be done with the array later.

          I would like to get the ArrayPrimitiveWritable committed soon, so unless the answers to the above are really obvious to all interested parties, maybe I should open a new Jira for a longer discussion?

          Finally, regarding putting it in v22, it should port trivially, but is it acceptable to have an incompatible protocol change in a released version? Thanks.

          Show
          Matt Foley added a comment - Regarding VersionedProtocols: By searching for all implementations of the VersionedProtocol API "getProtocolVersion(String, long)", and the values they return, I found the following 16 version constants: hdfs.protocol.ClientDatanodeProtocol.versionID; (8) hdfs.protocol.ClientProtocol.versionID; (65) hdfs.server.protocol.DatanodeProtocol.versionID; (27) hdfs.server.protocol.InterDatanodeProtocol.versionID; (6) hdfs.server.protocol.NamenodeProtocol.versionID; (5) mapred.AdminOperationsProtocol.versionID; (3) mapred.InterTrackerProtocol.versionID; (31) mapred.TaskUmbilicalProtocol.versionID; (19) mapreduce.protocol.ClientProtocol.versionID; (36) hadoop.security.authorize.RefreshAuthorizationPolicyProtocol.versionID; (1) hadoop.security.RefreshUserMappingsProtocol.versionID; (1) hadoop.ipc.AvroRpcEngine.VERSION; (0) TEST: hadoop.security.TestDoAsEffectiveUser.TestProtocol.versionID (1) hadoop.ipc.MiniRPCBenchmark.MiniProtocol.versionID; (1) hadoop.ipc.TestRPC.TestProtocol.versionID; (1) mapred.TestTaskCommit.MyUmbilical unnamed constant (0) The first nine are clearly production version numbers. The next three (two security and one avro) do not seem to have ever been incremented and I wonder if they need to be now. The last four are test-specific and I think should not be incremented. So please advise: 1. Do all the first nine protocols use WritableRPCEngine and therefore need to have their version numbers incremented? 2. Do the next three need to have their version numbers incremented for this change? 3. Do you agree that the four Test protocol versions should not change? 4. Did I miss any that you are aware of? Thank you. I will put together the versioning patch when we have consensus on what to change. Konstantin, regarding your suggestion to extend this enhancement to arrays of non-Primitive objects: There is a simple way to extend this approach to arrays of Strings and Writables. I coded it up and have it available, it adds an ArrayWritable.Internal very similar to ArrayPrimitiveWritable.Internal. Nice and clean. Of course the size improvement isn't as dramatic since the ratio of label size to object size isn't as bad as with primitives, but the performance improvement is still there (not having to go through the decision loop for every element of a long array). However, using it would cause a significant change in semantics: The current ObjectWritable can handle non-homogeneous arrays containing different kinds of Writables, and nulls. The optimization we are discussing here is removing the type-tagging of every array entry, thereby assuming that the array is in fact strictly homogeneous, including having no null entries. There is also the question of what type declaration the container array has on entry, and what type it should have on exit. In the current code the only restriction on array type is (Writable[]).type isAssignableFrom (X[]).type and X.type isAssignableFrom x [i] .getClass() for all elements i Also in the current code, the array produced on the receiving side is always simply Writable[]. Questions: 1. Is it acceptable to assume/mandate that all arrays of Writables passed in to RPC shall be homogeneous and have no null elements? Note that this is a very strict form of homogeneity, forbidding even subclass instances, because, for example, if you define a class FooWritable and a subclass SubFooWritable, you can put them both in an array of declared type FooWritable[], but the receiving-side deserializer will ONLY produce objects of type FooWritable, and will fail entirely unless the serialized output of FooWritable.write() and SubFooWritable.write() happen to be compatible (which is too complicated an exception to try to explain, IMO). 2. On the receiving side, should we continue producing an array of type Writable[], or (ii) preserve the type of the array during the implicit encoding process, or (iii) produce an array of componentType same as the actual element type, assuming the array is indeed strictly homogeneous so all elements are the same type? All three are easily done, but have implications about what can be done with the array later. I would like to get the ArrayPrimitiveWritable committed soon, so unless the answers to the above are really obvious to all interested parties, maybe I should open a new Jira for a longer discussion? Finally, regarding putting it in v22, it should port trivially, but is it acceptable to have an incompatible protocol change in a released version? Thanks.
          Hide
          Hairong Kuang added a comment -

          Matt, for the version number. I believe that you just need to bump up the RPC version number: org.apache.hadoop.ipc.Server#CURRENT_VERSION. This should be good enough.

          Show
          Hairong Kuang added a comment - Matt, for the version number. I believe that you just need to bump up the RPC version number: org.apache.hadoop.ipc.Server#CURRENT_VERSION. This should be good enough.
          Hide
          Konstantin Shvachko added a comment -
          • I recommend looking for subclasses of VersionedProtocol rather than search. You can see that most of the protocols you listed implement VersionedProtocol, except for two AvroRpcEngine and WritableRpcEngine, which implement RpcEngine. Avro has its own serialization object, afaiu, so we don't need to change version for the latter two.
          • I think it is fine to change only ipc.Server#CURRENT_VERSION, as Hairong sugested, which is the VersionedProtocol version, rather than all subclasses, as it will provide incompatibility for all of them at once.
          • As I said in my comment in HDFS-1583, Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous.
          • Yes we need to support only arrays of Writables (and of primitive types).
          • NULL values should be supported.
          • My opinion this is important optimization to be included in 0.22. If people have doubts I can start a vote on general. Lmk.
          Show
          Konstantin Shvachko added a comment - I recommend looking for subclasses of VersionedProtocol rather than search. You can see that most of the protocols you listed implement VersionedProtocol, except for two AvroRpcEngine and WritableRpcEngine, which implement RpcEngine. Avro has its own serialization object, afaiu, so we don't need to change version for the latter two. I think it is fine to change only ipc.Server#CURRENT_VERSION, as Hairong sugested, which is the VersionedProtocol version, rather than all subclasses, as it will provide incompatibility for all of them at once. As I said in my comment in HDFS-1583 , Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous. Yes we need to support only arrays of Writables (and of primitive types). NULL values should be supported. My opinion this is important optimization to be included in 0.22. If people have doubts I can start a vote on general. Lmk.
          Hide
          Suresh Srinivas added a comment -

          > As I said in my comment in HDFS-1583, Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous.

          Assuming you are talking about labeling the array only for the type it carries, the approach you are suggesting will not work for heartbeat response, where DatanodeCommand[] is the response, with array elements carrying subtypes of DatanodeCommand, which should be labelled once per array element.

          Also a protocol could choose to give a response of Object[]. Not sure I understand the reference you posted and how it is relevant to this discussion.

          I prefer this issue focusing on primitive types and opening a separate jira for non-primitive type, as I think it needs more discussion.

          Show
          Suresh Srinivas added a comment - > As I said in my comment in HDFS-1583 , Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous. Assuming you are talking about labeling the array only for the type it carries, the approach you are suggesting will not work for heartbeat response, where DatanodeCommand[] is the response, with array elements carrying subtypes of DatanodeCommand, which should be labelled once per array element. Also a protocol could choose to give a response of Object[]. Not sure I understand the reference you posted and how it is relevant to this discussion. I prefer this issue focusing on primitive types and opening a separate jira for non-primitive type, as I think it needs more discussion.
          Hide
          Matt Foley added a comment -

          This patch is same as v4, plus increments the RPC version number, org.apache.hadoop.ipc.Server#CURRENT_VERSION. Thanks, Hairong, for the clarification.

          HADOOP-7158 has been opened for the discussion of similar optimization of wire format for homogeneous arrays of non-primitive objects. I'll post a prototype patch to that Jira tomorrow.

          Unless the auto-tester finds a problem, I think this v5 patch is ready to go. Enabling patch test. Thanks.

          Show
          Matt Foley added a comment - This patch is same as v4, plus increments the RPC version number, org.apache.hadoop.ipc.Server#CURRENT_VERSION. Thanks, Hairong, for the clarification. HADOOP-7158 has been opened for the discussion of similar optimization of wire format for homogeneous arrays of non-primitive objects. I'll post a prototype patch to that Jira tomorrow. Unless the auto-tester finds a problem, I think this v5 patch is ready to go. Enabling patch test. Thanks.
          Matt Foley made changes -
          Attachment arrayprim_v5.patch [ 12472159 ]
          Matt Foley made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
          Introduces a much more efficient wire format to transmit arrays of primitives. This is done in a way that is 100% backward compatible with previously persisted data, so any stored data will still be readable. However, data written in the new version cannot be read by a process using the old version of protocol. Only RPC uses the new format; file writing protocols for application data are unchanged.
          Matt Foley made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matt Foley made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Matt Foley added a comment -

          Re-submitting same patch file in hopes that the auto-tester will run.

          Show
          Matt Foley added a comment - Re-submitting same patch file in hopes that the auto-tester will run.
          Matt Foley made changes -
          Attachment arrayprim_v5.patch [ 12472729 ]
          Matt Foley made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matt Foley made changes -
          Attachment arrayprim_v5.patch [ 12472729 ]
          Matt Foley made changes -
          Attachment arrayprim_v5.patch [ 12472159 ]
          Hide
          Matt Foley added a comment -

          One more time, trying to get auto-test to run.

          Show
          Matt Foley added a comment - One more time, trying to get auto-test to run.
          Matt Foley made changes -
          Attachment arrayprim_v5.patch [ 12472850 ]
          Matt Foley made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
          Introduces a much more efficient wire format to transmit arrays of primitives. This is done in a way that is 100% backward compatible with previously persisted data, so any stored data will still be readable. However, data written in the new version cannot be read by a process using the old version of protocol. Only RPC uses the new format; file writing protocols for application data are unchanged.
          Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
          Introduces a much more efficient wire format to transmit arrays of primitives. This is done in a way that is 100% backward compatible with previously persisted data, so any stored data will still be readable. However, data written in the new version cannot be read by a process using the old version of protocol. Only RPC uses the new format; file writing protocols for application data are unchanged..
          Affects Version/s 0.22.0 [ 12314296 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12472850/arrayprim_v5.patch
          against trunk revision 1078669.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 1078 javac compiler warnings (more than the trunk's current 1072 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/303//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/303//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/303//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12472850/arrayprim_v5.patch against trunk revision 1078669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1078 javac compiler warnings (more than the trunk's current 1072 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/303//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/303//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/303//console This message is automatically generated.
          Hide
          Matt Foley added a comment -

          Added @SuppressWarnings("deprecation") annotations to 6 uses of UTF8 class methods (which usages are consistent with usages in ObjectWritable.java).

          Show
          Matt Foley added a comment - Added @SuppressWarnings("deprecation") annotations to 6 uses of UTF8 class methods (which usages are consistent with usages in ObjectWritable.java).
          Matt Foley made changes -
          Attachment arrayprim_v6.patch [ 12472916 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12472916/arrayprim_v6.patch
          against trunk revision 1079071.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/304//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/304//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/304//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12472916/arrayprim_v6.patch against trunk revision 1079071. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/304//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/304//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/304//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          I am perfectly fine with dealing with complex types in another jira. If Suresh could review this it should be good to commit.

          Show
          Konstantin Shvachko added a comment - I am perfectly fine with dealing with complex types in another jira. If Suresh could review this it should be good to commit.
          Hide
          Suresh Srinivas added a comment -

          This is an important patch. Thanks for doing it in a backward compatible way and also in a way where both on disk format and RPC format can co-exist.

          Comments:

          1. ArrayPrimitiveWriable
            • Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected.
            • Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description.
            • Is PrimitiveArrayWritable a better name?
            • Add @InterfaceAudience.Public and @InterfaceStability.Stable
            • Throw HadoopIllegalArgumentException instead of NullPointerException and IllegalArgumentException
            • Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper.
            • Rename isCandidatePrimitive() to isPrimitive()?
            • Is constructor with componentType as argument used any where?
            • Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(),
            • Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch.
            • I prefer MalformedInputException instead of ProtocolException (see Text.java for example)
          2. ObjectWritable.java
            • Can you call set() method while setting instance and declaredClass in #writeObject() method?
            • Not sure about the comment: // This case must come after the "isArray()" case
            • Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName())
          3. TestArrayWritable
            • In the class javadoc can you add link to PrimitiveArrayWritable?
            • Can you make in, out, bigSet, resultSet and expectedSet final?
            • Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract?
          Show
          Suresh Srinivas added a comment - This is an important patch. Thanks for doing it in a backward compatible way and also in a way where both on disk format and RPC format can co-exist. Comments: ArrayPrimitiveWriable Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected. Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description. Is PrimitiveArrayWritable a better name? Add @InterfaceAudience.Public and @InterfaceStability.Stable Throw HadoopIllegalArgumentException instead of NullPointerException and IllegalArgumentException Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper. Rename isCandidatePrimitive() to isPrimitive()? Is constructor with componentType as argument used any where? Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(), Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch. I prefer MalformedInputException instead of ProtocolException (see Text.java for example) ObjectWritable.java Can you call set() method while setting instance and declaredClass in #writeObject() method? Not sure about the comment: // This case must come after the "isArray()" case Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName()) TestArrayWritable In the class javadoc can you add link to PrimitiveArrayWritable? Can you make in, out, bigSet, resultSet and expectedSet final? Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract?
          Hide
          Matt Foley added a comment -
          1. ArrayPrimitiveWriable
          • Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected.

          Javadoc comments simplified.

          • Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description.

          Provided.

          • Is PrimitiveArrayWritable a better name?

          By common usage, "PrimitiveArrayWritable" would imply a sub-class of ArrayWritable, which this isn't. So I thought it best to use a different word order.

          • Add @InterfaceAudience.Public and @InterfaceStability.Stable

          Done.

          • Throw HadoopIllegalArgumentException instead of NullPointerException and IllegalArgumentException

          Done.

          • Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper.

          After discussion, we agreed there is no benefit in efficiency or code clarity, so no change in this regard.

          • Rename isCandidatePrimitive() to isPrimitive()?

          Agreed. Originally, isCandidatePrimitive() provided a layer of indirection so that ArrayPrimitiveWritable didn't have to contract to support ALL primitive types. However, since it ended up supporting them all, the distinction is irrelevant.
          Changed to use Class.isPrimitive().

          • Is constructor with componentType as argument used any where?

          No, it is provided to make ArrayPrimitiveWritable more generally useful.

          • Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(),

          No, if the user elects to use the typechecking capabilities of declaredComponentType, he should be able to make both of these queries.

          • Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch.

          I do not want to change the usage from that of ObjectWritable. Essentially all our Writable classes use UTF8. If we want to change that, please open a different Jira.

          • I prefer MalformedInputException instead of ProtocolException (see Text.java for example)

          I agree ProtocolException isn't exactly right.
          Problem with MalformedInputException: it doesn't take a "message" argument, only an integer, so we can't use that.
          Changed to just use IOException, since that's what most of the Writable code uses.

          ==========

          1. ObjectWritable.java
          • Can you call set() method while setting instance and declaredClass in #writeObject() method?

          My understanding is that ObjectWritable.writeObject() is a static method that does not instantiate an ObjectWritable instance. The variables "instance" and "declaredClass" are both local variables in #writeObject() context. So calling the non-static "set()" would not be okay.

          • Not sure about the comment: // This case must come after the "isArray()" case

          You're right, it's not order-dependent since it uses /if/else if/else/ rather than a sequence of "if"s.
          Removed the comment.

          • Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName())

          I'm being paranoid. There can be a range of behaviors if declaredClass isn't exactly the same as instance.getClass(), so I'm refusing to allowCompactArrays if the caller isn't declaring the true type.

          ==========

          1. TestArrayPrimitiveWritable
          • In the class javadoc can you add link to PrimitiveArrayWritable?

          Done. Good catch.

          • Can you make in, out, bigSet, resultSet and expectedSet final?

          Sure.

          • Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract?

          For "normal" i/o I do use readFields() and write(). Then I test to see that the wire format is as expected, by going around those methods and directly invoking lower-level functionality. I think this is the only way to get confidence in the wire format encoders and decoders.

          Show
          Matt Foley added a comment - ArrayPrimitiveWriable Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected. Javadoc comments simplified. Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description. Provided. Is PrimitiveArrayWritable a better name? By common usage, "PrimitiveArrayWritable" would imply a sub-class of ArrayWritable, which this isn't. So I thought it best to use a different word order. Add @InterfaceAudience.Public and @InterfaceStability.Stable Done. Throw HadoopIllegalArgumentException instead of NullPointerException and IllegalArgumentException Done. Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper. After discussion, we agreed there is no benefit in efficiency or code clarity, so no change in this regard. Rename isCandidatePrimitive() to isPrimitive()? Agreed. Originally, isCandidatePrimitive() provided a layer of indirection so that ArrayPrimitiveWritable didn't have to contract to support ALL primitive types. However, since it ended up supporting them all, the distinction is irrelevant. Changed to use Class.isPrimitive(). Is constructor with componentType as argument used any where? No, it is provided to make ArrayPrimitiveWritable more generally useful. Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(), No, if the user elects to use the typechecking capabilities of declaredComponentType, he should be able to make both of these queries. Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch. I do not want to change the usage from that of ObjectWritable. Essentially all our Writable classes use UTF8. If we want to change that, please open a different Jira. I prefer MalformedInputException instead of ProtocolException (see Text.java for example) I agree ProtocolException isn't exactly right. Problem with MalformedInputException: it doesn't take a "message" argument, only an integer, so we can't use that. Changed to just use IOException, since that's what most of the Writable code uses. ========== ObjectWritable.java Can you call set() method while setting instance and declaredClass in #writeObject() method? My understanding is that ObjectWritable.writeObject() is a static method that does not instantiate an ObjectWritable instance. The variables "instance" and "declaredClass" are both local variables in #writeObject() context. So calling the non-static "set()" would not be okay. Not sure about the comment: // This case must come after the "isArray()" case You're right, it's not order-dependent since it uses /if/else if/else/ rather than a sequence of "if"s. Removed the comment. Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName()) I'm being paranoid. There can be a range of behaviors if declaredClass isn't exactly the same as instance.getClass(), so I'm refusing to allowCompactArrays if the caller isn't declaring the true type. ========== TestArrayPrimitiveWritable In the class javadoc can you add link to PrimitiveArrayWritable? Done. Good catch. Can you make in, out, bigSet, resultSet and expectedSet final? Sure. Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract? For "normal" i/o I do use readFields() and write(). Then I test to see that the wire format is as expected, by going around those methods and directly invoking lower-level functionality. I think this is the only way to get confidence in the wire format encoders and decoders.
          Matt Foley made changes -
          Attachment arrayprim_v7.patch [ 12474048 ]
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12474048/arrayprim_v7.patch
          against trunk revision 1082788.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/315//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/315//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/315//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474048/arrayprim_v7.patch against trunk revision 1082788. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/315//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/315//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/315//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch.

          Show
          Suresh Srinivas added a comment - +1 for the patch.
          Hide
          Suresh Srinivas added a comment -

          I committed the patch. Thank you Matt.

          Show
          Suresh Srinivas added a comment - I committed the patch. Thank you Matt.
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Incompatible change] [Incompatible change, Reviewed]
          Release Note Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
          Introduces a much more efficient wire format to transmit arrays of primitives. This is done in a way that is 100% backward compatible with previously persisted data, so any stored data will still be readable. However, data written in the new version cannot be read by a process using the old version of protocol. Only RPC uses the new format; file writing protocols for application data are unchanged..
          Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
          Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #531 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/531/)
          HADOOP-6949. Reduce RPC packet size of primitive arrays using ArrayPrimitiveWritable instead of ObjectWritable. Contributed by Matt Foley.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #531 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/531/ ) HADOOP-6949 . Reduce RPC packet size of primitive arrays using ArrayPrimitiveWritable instead of ObjectWritable. Contributed by Matt Foley.
          Matt Foley made changes -
          Link This issue is related to HADOOP-7158 [ HADOOP-7158 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #638 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/638/)
          HADOOP-6949. Reduce RPC packet size of primitive arrays using ArrayPrimitiveWritable instead of ObjectWritable. Contributed by Matt Foley.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #638 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/638/ ) HADOOP-6949 . Reduce RPC packet size of primitive arrays using ArrayPrimitiveWritable instead of ObjectWritable. Contributed by Matt Foley.
          Hide
          Konstantin Shvachko added a comment -

          I think we should also commit it to 0.22.

          Show
          Konstantin Shvachko added a comment - I think we should also commit it to 0.22.
          Hide
          Matt Foley added a comment -

          Since this is an incompatible protocol change, please do start whatever discussion is needed in the community.

          Show
          Matt Foley added a comment - Since this is an incompatible protocol change, please do start whatever discussion is needed in the community.
          Hide
          Matt Foley added a comment -

          Hi Konstantin, the vote seems to be pretty strongly in favor, so I went ahead and ran some tests of the patch against v0.22. Confirmed that the patch applies automatically, and hadoop/common compiles and successfully runs core unit tests.

          Show
          Matt Foley added a comment - Hi Konstantin, the vote seems to be pretty strongly in favor, so I went ahead and ran some tests of the patch against v0.22. Confirmed that the patch applies automatically, and hadoop/common compiles and successfully runs core unit tests.
          Hide
          Konstantin Shvachko added a comment -

          The vote was conducted on the dev lists for common, hdfs, and mapreduce.
          I counted nine +1s and no -1s.
          We can commit it to 0.22.

          Show
          Konstantin Shvachko added a comment - The vote was conducted on the dev lists for common, hdfs, and mapreduce. I counted nine +1s and no -1s. We can commit it to 0.22.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this to branch 0.22. Thank you Matt.

          Show
          Konstantin Shvachko added a comment - I just committed this to branch 0.22. Thank you Matt.
          Konstantin Shvachko made changes -
          Fix Version/s 0.22.0 [ 12314296 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #544 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/544/)
          Committing HADOOP-6949 to branch 0.22.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #544 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/544/ ) Committing HADOOP-6949 to branch 0.22.
          Hide
          Matt Foley added a comment -

          Great! Thanks, Konstantin.

          Show
          Matt Foley added a comment - Great! Thanks, Konstantin.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-22-branch #38 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/38/)
          HADOOP-6949. Merge -c 1083957 from trunk to branch 0.22.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-22-branch #38 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/38/ ) HADOOP-6949 . Merge -c 1083957 from trunk to branch 0.22.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #654 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/654/)
          Committing HADOOP-6949 to branch 0.22.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #654 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/654/ ) Committing HADOOP-6949 to branch 0.22.
          Matt Foley made changes -
          Link This issue blocks HDFS-2126 [ HDFS-2126 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue blocks HDFS-2126 [ HDFS-2126 ]
          Gavin made changes -
          Link This issue is depended upon by HDFS-2126 [ HDFS-2126 ]

            People

            • Assignee:
              Matt Foley
              Reporter:
              Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 10m
                10m
                Remaining:
                Remaining Estimate - 10m
                10m
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development