Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-2
Description
Google/CCTZ was chosen as Impala's new timezone database in IMPALA-3307 and benchmarks have shown significant improvements compared to the previous libc / boost solution (see convert-timestamp-benchmark.cc in https://gerrit.cloudera.org/#/c/9986/). If further optimizations are needed, it may become necessary to modify Google/CCTZ to serve Impala's needs with less overhead.
I see 3 points where avoidable overhead is added to UTC<->local conversions:
1. Timezone conversions use atomic integer hints that speed up rule lookup if the timestamp is close to the one in the last call ( see https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/src/time_zone_info.cc#L828 ). This may be detrimental if the function is called from different threads with far away timestamps. As the same class that contains hints contains the vector of rule transitions (which can be large), creating a copy for every thread would be expensive.
The solution would be to move the hints to a new class whose instances would not be shared between threads.
2. CCTZ handles local time in civil_seconds, which contains year/month/day/hour/minutes/seconds in separate fields, while Impala stores timestamps in "days since some epoch" (boost::gregorian::date)/"nanoseconds till midnight" (boost::posix_time::time_duration). A modified version of TimeZoneInfo::MakeTime() could skip this back and forth conversion.
3. Timezone conversions use a virtual function, while the backing class is always https://github.com/google/cctz/blob/master/src/time_zone_info.h.
This could be probably avoided without modifying CCTZ by including some of its headers in https://github.com/google/cctz/tree/master/src
Attachments
Issue Links
- relates to
-
IMPALA-3307 add support for IANA time zone database
- Resolved