I think arkady's point is much more to the point than this quoting proposal, which I think is going the wrong way!
There are two interfaces here - that between man & reduce and that into map and out of reduce. I think they deserve different handling.
1) map in & reduce out - Should by default just consume bytes and produce bytes. The framework should do no interpretation or quoting. It should not try to break the output into lines, keys & values, etc. It is just a byte stream. This will allow true binary output with zero hassle. Some thought on splits is clearly needed...
2) map out & reduce in - Here we clearly need keys and values. But i think quoting might be the wrong direction. It should certainly not be the default. I think we should consider just providing an option that specifies a new binary format will be used. here. Maybe a 4 byte length followed a binary key followed by a 4 byte length and then a binary value? Maybe with a record terminator for sanity checking?
1) Adding quoting by default will break all kinds of programs that work with streaming today. This is undesirable. We should add an option, not change the default behavior.
2) Streaming should not use utf8 anywhere! It should assume that it is processing a stream of bytes that contains certain signal bytes '\n' and '\t'. I think we all agree on this. treating the byte stream as a character stream just confuses things.