Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
impala 2.3
Description
Currently the time zones are hard coded timezone_db.cc and they do not take into account that timezone definitions changed year to year (except for Moscow CDH-19918).
I suggest moving timezone info into a separate config file, so that admins can update if necessary, plus provide tools for updating it from well-known sources.
1) Define an impala-friendly file format for timezone data (preferably human-editable as well, even more preferably a format that other similar systems already use)
2) Create tool to extract timezone data from the IANA tzdata database or /usr/share/zoneinfo
into the format specified.
3) File (path, hdfs path) should be part of configuration
4) backends should load the tzinfo into a quick memory structure (quick lookup by id + date) (maybe load/cache each time zone on demand, most of them will never be used)
5) all date functions should use this generic tzinfo from memory
regarding 2), similar tools:
http://www.oracle.com/technetwork/java/javase/tzupdater-readme-136440.html
http://dev.mysql.com/doc/refman/5.7/en/mysql-tzinfo-to-sql.html
regarding 3), some reasons to make this configurable, and making 2) a manual step:
- tzinfo is not perfectly standardised, automatic solutions might fail on some OSes
- tzinfo on different hosts might be out of sync. Good luck with debugging such cases...
- we wouldn't want query results automagically/unexpectedly change on OS upgrade
- we should give the admins the possibility to override / fine-tune tz data if the applications require doing so.
Attachments
Issue Links
- breaks
-
IMPALA-7200 Local filesystem dataload fails due to missing FILESYSTEM_PREFIX
- Resolved
- incorporates
-
IMPALA-1577 Improve from_utc_timestamp perf which is unpredictable and slow
- Resolved
-
IMPALA-5563 Timezone lookup may be ambiguous
- Closed
- is depended upon by
-
IMPALA-3169 to_utc_timestamp blank value
- Closed
- is duplicated by
-
IMPALA-5978 DST is handled incorrectly for EET
- Resolved
-
IMPALA-3082 BST between 1972 and 1995
- Resolved
- is related to
-
IMPALA-5978 DST is handled incorrectly for EET
- Resolved
-
IMPALA-5203 from_utc_timestamp inconsistent how it handles daily savings time
- Resolved
-
IMPALA-3316 convert_legacy_hive_parquet_utc_timestamps=true makes reading parquet tables 30x slower
- Resolved
-
IMPALA-7085 Consider patching Google/CCTZ for Impala's needs
- Open
-
IMPALA-6763 to_utc_timestamp() doesn't consider daylight saving for all timezones
- Resolved
-
IMPALA-3169 to_utc_timestamp blank value
- Closed
- relates to
-
IMPALA-5563 Timezone lookup may be ambiguous
- Closed
-
IMPALA-7060 Restrict Impala to only support timezones that work in Hive (IANA + Java)
- Resolved