Currently the time zones are hard coded timezone_db.cc and they do not take into account that timezone definitions changed year to year (except for Moscow CDH-19918).
I suggest moving timezone info into a separate config file, so that admins can update if necessary, plus provide tools for updating it from well-known sources.
1) Define an impala-friendly file format for timezone data (preferably human-editable as well, even more preferably a format that other similar systems already use)
2) Create tool to extract timezone data from the IANA tzdata database or /usr/share/zoneinfo
into the format specified.
3) File (path, hdfs path) should be part of configuration
4) backends should load the tzinfo into a quick memory structure (quick lookup by id + date) (maybe load/cache each time zone on demand, most of them will never be used)
5) all date functions should use this generic tzinfo from memory
regarding 3), some reasons to make this configurable, and making 2) a manual step:
- tzinfo is not perfectly standardised, automatic solutions might fail on some OSes
- tzinfo on different hosts might be out of sync. Good luck with debugging such cases...
- we wouldn't want query results automagically/unexpectedly change on OS upgrade
- we should give the admins the possibility to override / fine-tune tz data if the applications require doing so.