> Hive would probably prefer a binary representation for performance [ ... ]
It might be useful to quantify the performance difference, perhaps benchmarking the writing and reading a snappy-compressed file that contains a decimal field represented as either bytes or as a string.
A faster alternative to a subtype might be to use a record, e.g.:
If we still changed GenericData to implement this directly then there would be no overhead and implementation would be easier & faster, since it wouldn't need a temporary buffer. It wouldn't be very useful to implementations that don't yet know about it, but neither would the binary subtype. We could add this type to the specification as something that implementations might optimize, just like a subtype. So this might be something to benchmark too.
> if it upgraded to the new version of Avro and read a file with a decimal subtype it would receive a BigDecimal when it was only expecting a ByteBuffer.
Today if an application using specific or reflect uses BigDecimal then it will be read as BigDecimal, since that's currently encoded using the schema
. So the schema would change when they upgrade, but the object would not. That seems compatible to me. You?
If the application is using Generic to write, then BigDecimal will currently fail.
I assume that existing applications are not currently using "subType":"decimal", no application should start receiving BigDecimal that wasn't before. If the write path is upgraded before the read path then the application will start seeing bytes where before it saw either BigDecimal or nothing. This is a potential compatibility problem, but not the one you seem to describe.