Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5137

Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Impala aims to support TIMESTAMP for Kudu tables. Because Impala’s TIMESTAMP type is a 96-bit type with nanosecond precision and Kudu’s timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), a conversion will be necessary.

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        main patch has been committed:
        commit a16a0fa84d77f96e428b278f8cc37ebd7a49899f
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Tue Mar 28 19:05:03 2017 -0700

        IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP

        Adds Impala support for TIMESTAMP types stored in Kudu.

        Impala stores TIMESTAMP values in 96-bits and has nanosecond
        precision. Kudu's timestamp is a 64-bit microsecond delta
        from the Unix epoch (called UNIXTIME_MICROS), so a conversion
        is necessary.

        When writing to Kudu, TIMESTAMP values in nanoseconds are
        averaged to the nearest microsecond.

        When reading from Kudu, the KuduScanner returns
        UNIXTIME_MICROS with 8bytes of padding so Impala can convert
        the value to a TimestampValue in-line and copy the entire
        row.

        Testing:
        Updated the functional_kudu schema to use TIMESTAMPs instead
        of converting to STRING, so this provides some decent
        coverage. Some BE tests were added, and some EE tests as
        well.

        TODO: Support pushing down TIMESTAMP predicates
        TODO: Support TIMESTAMPs in range partitioning expressions

        Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
        Reviewed-on: http://gerrit.cloudera.org:8080/6526
        Reviewed-by: Matthew Jacobs <mj@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - main patch has been committed: commit a16a0fa84d77f96e428b278f8cc37ebd7a49899f Author: Matthew Jacobs <mj@cloudera.com> Date: Tue Mar 28 19:05:03 2017 -0700 IMPALA-5137 : Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP Adds Impala support for TIMESTAMP types stored in Kudu. Impala stores TIMESTAMP values in 96-bits and has nanosecond precision. Kudu's timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), so a conversion is necessary. When writing to Kudu, TIMESTAMP values in nanoseconds are averaged to the nearest microsecond. When reading from Kudu, the KuduScanner returns UNIXTIME_MICROS with 8bytes of padding so Impala can convert the value to a TimestampValue in-line and copy the entire row. Testing: Updated the functional_kudu schema to use TIMESTAMPs instead of converting to STRING, so this provides some decent coverage. Some BE tests were added, and some EE tests as well. TODO: Support pushing down TIMESTAMP predicates TODO: Support TIMESTAMPs in range partitioning expressions Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Reviewed-on: http://gerrit.cloudera.org:8080/6526 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins
        Hide
        jrussell John Russell added a comment - - edited

        Todd Lipcon / Greg Rahn:

        A couple of builtin functions are exposed in Impala related to the UNIXTIME_MICROS values:

        utc_to_unix_micros() was added as a part of 
        IMPALA-5137 in may 2017
        
        timestamp_from_unix_micros() added in IMPALA-5338 (may 2017) was renamed to unix_micros_to_utc_timestamp() in 
        IMPALA-5539 (july 2017)
        

        However, I wasn't asked at the time to document them and they aren't mentioned in the "spec" of Kudu TIMESTAMP support. I'm asking for a ruling:

        Are utc_to_unix_micros() and unix_micros_to_utc_timestamp() intended to be user-callable functions and should be documented, or are they internal-only functions used by Impala and they should be hidden from the SHOW FUNCTIONS output? The fact that one of the functions was renamed especially makes me think these functions were not intended to be user-visible / user-callable.

        Show
        jrussell John Russell added a comment - - edited Todd Lipcon / Greg Rahn : A couple of builtin functions are exposed in Impala related to the UNIXTIME_MICROS values: utc_to_unix_micros() was added as a part of IMPALA-5137 in may 2017 timestamp_from_unix_micros() added in IMPALA-5338 (may 2017) was renamed to unix_micros_to_utc_timestamp() in IMPALA-5539 (july 2017) However, I wasn't asked at the time to document them and they aren't mentioned in the "spec" of Kudu TIMESTAMP support. I'm asking for a ruling: Are utc_to_unix_micros() and unix_micros_to_utc_timestamp() intended to be user-callable functions and should be documented, or are they internal-only functions used by Impala and they should be hidden from the SHOW FUNCTIONS output? The fact that one of the functions was renamed especially makes me think these functions were not intended to be user-visible / user-callable.
        Hide
        grahn Greg Rahn added a comment -

        John Russell - Seems like public functions to me.
        see https://gerrit.cloudera.org/#/c/7311/

        Show
        grahn Greg Rahn added a comment - John Russell - Seems like public functions to me. see https://gerrit.cloudera.org/#/c/7311/

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development