I am not quite sure how to solve the backward compatibility issue in the "writable" part of TimestampWritable code (write/readFields) by switching to a unified nanosecond-timestamp-as-long format. If readFields is presented with eight bytes, would it interpret them as a four-byte int followed by a VInt or as a long nanosecond timestamp? Would it attempt to do the former and revert to the latter if there are inconsistencies? What if the bytes of a long nanosecond timestamp also happen to represent a valid legacy (int/VInt) timestamp?
In my patch, I try to maintain backward compatibility as much as possible. If a timestamp is in the range that can be represented by the old format, it is serialized using the old format. The extended format I've proposed and implemented for the full timestamp range builds on top of the existing one and can be unambiguously distinguished from the old format by examining serialized bytes.
In addition, the included test, TestTimestampWritable, tests both the old and the new (extended format), as well as double/BigDecimal conversion, getters/setters/constructors and everything else I could test in TimestampWritable.
I am sure there is a way to handle vector optimizations for timestamps in a backward-compatible way, and I don't think this patch would make it much more complicated than it already is. However, vectorized computations are a performance optimization, while this issue is a correctness fix. Currently, timestamps outside of the ~1970-2038 range would be silently corrupted in some queries, and this patch successfully fixes that. It is also pretty small and immediately available.