Regarding VersionedProtocols: By searching for all implementations of the VersionedProtocol API "getProtocolVersion(String, long)", and the values they return, I found the following 16 version constants:
mapred.TestTaskCommit.MyUmbilical unnamed constant (0)
The first nine are clearly production version numbers. The next three (two security and one avro) do not seem to have ever been incremented and I wonder if they need to be now. The last four are test-specific and I think should not be incremented. So please advise:
1. Do all the first nine protocols use WritableRPCEngine and therefore need to have their version numbers incremented?
2. Do the next three need to have their version numbers incremented for this change?
3. Do you agree that the four Test protocol versions should not change?
4. Did I miss any that you are aware of?
Thank you. I will put together the versioning patch when we have consensus on what to change.
Konstantin, regarding your suggestion to extend this enhancement to arrays of non-Primitive objects:
There is a simple way to extend this approach to arrays of Strings and Writables. I coded it up and have it available, it adds an ArrayWritable.Internal very similar to ArrayPrimitiveWritable.Internal. Nice and clean. Of course the size improvement isn't as dramatic since the ratio of label size to object size isn't as bad as with primitives, but the performance improvement is still there (not having to go through the decision loop for every element of a long array).
However, using it would cause a significant change in semantics:
The current ObjectWritable can handle non-homogeneous arrays containing different kinds of Writables, and nulls.
The optimization we are discussing here is removing the type-tagging of every array entry, thereby assuming that the array is in fact strictly homogeneous, including having no null entries. There is also the question of what type declaration the container array has on entry, and what type it should have on exit. In the current code the only restriction on array type is
(Writable).type isAssignableFrom (X).type and
X.type isAssignableFrom x[i].getClass() for all elements i
Also in the current code, the array produced on the receiving side is always simply Writable.
1. Is it acceptable to assume/mandate that all arrays of Writables passed in to RPC shall be homogeneous and have no null elements? Note that this is a very strict form of homogeneity, forbidding even subclass instances, because, for example, if you define a class FooWritable and a subclass SubFooWritable, you can put them both in an array of declared type FooWritable, but the receiving-side deserializer will ONLY produce objects of type FooWritable, and will fail entirely unless the serialized output of FooWritable.write() and SubFooWritable.write() happen to be compatible (which is too complicated an exception to try to explain, IMO).
2. On the receiving side, should we continue producing an array of type Writable, or (ii) preserve the type of the array during the implicit encoding process, or (iii) produce an array of componentType same as the actual element type, assuming the array is indeed strictly homogeneous so all elements are the same type? All three are easily done, but have implications about what can be done with the array later.
I would like to get the ArrayPrimitiveWritable committed soon, so unless the answers to the above are really obvious to all interested parties, maybe I should open a new Jira for a longer discussion?
Finally, regarding putting it in v22, it should port trivially, but is it acceptable to have an incompatible protocol change in a released version? Thanks.