Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5609

Refactor TimestampFunctions and TimestampValue

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:

      Description

      The Timestamp-related code in the backend is getting very confusing. We have several logical groups of timestamp related code:

      • TIMESTAMP functions (i.e. public functions)
      • The Timestamp types: TimestampValue and it's UDF cousin TimestampVal
      • Various conversion functions between different kinds of types/formats e.g. TimestampVal, TimestampValue, unix times, ptimes, strings w/ formats.

      Things got particularly hairy when the --use_local_tz_for_unix_timestamp_conversions flag was added. The purpose of that flag was to enable Hive/MySQL compatibility for the unix_timestamp(...) and from_unixtime(...) SQL functions. However, the logic that handles this flag (thus determining whether or not to convert to/from local time & UTC) lives in TimestampValue. This was a mistake: a TimestampValue should only represent a ptime. This has led to a lot of confusion and bugs when code using TimestampValue ended up getting unwanted timezone conversions.

      We should clean up the code by:

      1) ensuring that TimestampFunctions contains only the SQL functions that get exposed. The handling of --use_local_tz_for_unix_timestamp_conversions should be in TimestampFunctions, because it should only affect SQL functions and not the raw data types themselves. (see below).
      2) move all the 'Unix time' logic out of TimestampValue and into a TimestampUtil class. It can expose Unix time functions for cases involving local time and UTC times, and TimestampFunctions can call the appropriate functions. The timezones should not bleed into TimestampValue.
      3) TimestampValue should basically just be a wrapper around a ptime.

      Proposed naming convention for TimestampFunctions:

      [SrcTz][SrcType[precision]]To[DstTz][DstType[precision]]
      

      SrcTz and/or DstTz is only specified if:

      • the function has nonstandard behavior, e.g. a Unix time to Timestamp conversion which always returns a UTC Timestamp regardless of the setting of FLAGS_use_local_tz_for_ts_conv
      • there is an explicit tz conversion, e.g. the existing FromUtc/ToUtc methods (which would be renamed to something consistent with the new scheme)

      SrcType/DstType will always be specified (unlike today where there are inconsistent cases where we try to use shorter names, but inconsistently). Right now, these are:

      • Unix (implies Seconds)
      • UnixMicros
      • Timestamp (implies TimestampVal)
      • String (implies Date and/or Time string, fmt string may be a separate parameter or the default formatting)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mjacobs Matthew Jacobs
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: