Looks great! I'm glad you were able to make this fast.
A few ideas:
- I like the switch with corruption-check on DiskDV. Can we easily integrate this into Lucene42?
- Can we update the file format docs (we attempt to describe the numerics strategies succinctly here)
I can do a more thorough review and some additional testing later, but this looks awesome.
Later we should think about a place (maybe in codec file format docs, maybe even NumericDocValuesField?) to add some practical general guidelines to users, that might not otherwise be intuitive: Stuff like if you are putting Dates in NumericDV, zero out portions you dont care about (e.g. milliseconds, time, etc) to save space, indexing as UTC will be a little more efficient than with local offset, etc.
Improves BaseDocValuesFormatTest which almost only tested "TABLE_COMPRESSED" with Lucene42DVF
Yeah this is a good catch! We should also maybe open an issue to review DiskDV and try to make it more efficient. Optimizations like TABLE_COMPRESSED don't exist there I think: it could be handy if someone wants e.g. smallfloat scoring factor. Its nice this patch provides back compat for DiskDV but its not totally necessary in the future, if we want to review and rewrite it. In general that codec was just done very quickly and hasn't seen much benchmarking or anything: could use some work.