> Getting ProtocolBuffers, Thrift, and Avro types through MapReduce end to end. Obviously this includes supporting SequenceFiles, which are where the bulk of Hadoop data is currently stored.
This does not follow. We cannot currently pass an object that does not implement Writable through the shuffle without wrapping it in a Writable. However we can and do currently support input and output of objects that do not implement Writable: RecordReader and RecordWriter do not require Writable. So no modifications to SequenceFile are required to permit end-to-end passage of non-Writables in mapreduce.
> Supporting context-specific serializations (input key, input value, shuffle key, shuffle value, output key, output value, etc) so that different serialization options can chosen depending on the application's requirements.
This does not require a binary format, only a metadata format that can be somehow nested.
HADOOP-6420 made this possible.
> This worked, but was very ugly. It lead to "stringly-typed" interfaces where you needed to read all of the code to figure out what the legal values for the configuration were.
This sounds like a documentation issue, not a functional deficiency. This style is used consistently throughout Hadoop. If we seek to replace Configuration that should perhaps be considered wholesale rather than piecemeal.
> By making the framework use typed metadata instead of the very generic, but type-less, string to string map many user errors will be avoided.
The current style is to provide methods to access configurations and metadata. These methods prevent such type errors. I have not seen a large number of complaints from end users about this aspect of Hadoop.
> The indication that he gave when I gave the presentation on my plan 5 months ago was that he didn't like it, but wouldn't block it. He reiterated that position on this jira 6 days ago. Have you changed your mind, Doug?
I had hoped that not threatening a veto but rather providing strong criticism would elicit compromise and collaboration. It seems to have unfortunately achieved the opposite. I am sorry to learn that this strategy has failed and, yes, I am now leaning towards a veto of this issue.
> Bootstrapping wasn't a problem at all.
Bootstrapping a generic serialization system by requiring a particular serialization system is a bootstrapping problem.
> The change to the clients is the same size, regardless of whether the metadata is encoded in binary or string to string maps.
That's not true. If clients already use a Map<String,String> like Configuration (as jobs do) then no change is required.