Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3557

Add workaround to create BIGINT stored as Kudu's UNIXTIME_MICROS

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: Kudu_Impala
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Kudu exposes a type called UNIXTIME_MICROS that is essentially an int64 and is intended to store microseconds since the unix epoch. Impala has a TIMESTAMP type which differs significantly (an INT96 storing something based on boost's "date" and "time_duration" classes).

      Users would like an Impala TIMESTAMP that works with Kudu, but this isn't feasible yet. For now, Impala will support creating Kudu tables with this underlying UNIXTIME_MICROS type but treating it otherwise as a BIGINT. In the future, Impala and Kudu will support a shared TIMESTAMP type. That work has yet to be planned.

      For now, Impala will provide syntax in CREATE TABLE for Kudu tables to create integer columns with a hint (comment containing a specific string) that indicates to Impala to create the underlying column as a Kudu UNIXTIME_MICROS type, but the column is an integer as far as Impala is concerned. Kudu’s UNIXTIME_MICROS type is just a int64, so it will be read and returned as such. All operations in Impala, e.g. describe table, select stmts, insert stmts, etc., treat this column as a BIGINT (because to Impala it is).

      Work items:
      Add support for hint in CREATE TABLE
      Add support for hint in ALTER TABLE

        Activity

        Hide
        fish0515_impala_49b1 fishing added a comment -

        we have impl this function but there is no convenient way commit code

        Show
        fish0515_impala_49b1 fishing added a comment - we have impl this function but there is no convenient way commit code
        Hide
        mjacobs Matthew Jacobs added a comment -

        Impala timestamps (16 bytes) can represent dates in the range 1400-01-01 to 9999-12-31, and store nanosecond precision.
        Kudu timestamps (8 bytes) represent microseconds since 1/1/1970 (UTC epoch), and I think they may reserve some bits for timezone info to be used in the future. I'm not sure what their min/max supported dates are, but they almost certainly won't line up with ours.
        David Alves can you comment on what is actually implemented for Kudu TIMESTAMP? Is it what you said in https://issues.apache.org/jira/browse/KUDU-925 , 59 bits for micros +/- UTC epoch?

        When reading:
        We'd have to handle dates outside of our supported range. Should we truncate to our min/max? NULL? Is it a warning (which would be an error when ABORT_ON_ERROR=true)?

        When writing to Kudu:
        We'll have to lose precision. We might have to do something with dates that are outside of their range.

        @Santosh can you weigh in on how Impala timestamps should interact with Kudu timestamps?

        Eventually Impala will migrate to INT64.

        casey is there another JIRA about that, or any other info?

        Show
        mjacobs Matthew Jacobs added a comment - Impala timestamps (16 bytes) can represent dates in the range 1400-01-01 to 9999-12-31, and store nanosecond precision. Kudu timestamps (8 bytes) represent microseconds since 1/1/1970 (UTC epoch), and I think they may reserve some bits for timezone info to be used in the future. I'm not sure what their min/max supported dates are, but they almost certainly won't line up with ours. David Alves can you comment on what is actually implemented for Kudu TIMESTAMP? Is it what you said in https://issues.apache.org/jira/browse/KUDU-925 , 59 bits for micros +/- UTC epoch? When reading: We'd have to handle dates outside of our supported range. Should we truncate to our min/max? NULL? Is it a warning (which would be an error when ABORT_ON_ERROR=true)? When writing to Kudu: We'll have to lose precision. We might have to do something with dates that are outside of their range. @Santosh can you weigh in on how Impala timestamps should interact with Kudu timestamps? Eventually Impala will migrate to INT64. casey is there another JIRA about that, or any other info?
        Hide
        caseyc casey added a comment -

        I don't see anything that specifically tracks that. I think Ryan Blue was leading that. There is a Cloudera internal design doc and issue. I'll send those to you and Dimitri. It seems like someone needs to take over for Ryan (there's still a bunch of design/coordination that needs to be done).

        Show
        caseyc casey added a comment - I don't see anything that specifically tracks that. I think Ryan Blue was leading that. There is a Cloudera internal design doc and issue. I'll send those to you and Dimitri. It seems like someone needs to take over for Ryan (there's still a bunch of design/coordination that needs to be done).
        Hide
        mjacobs Matthew Jacobs added a comment -

        Ok, thanks. I also checked with Santosh (PM for datatypes) and it sounds like now that customers are using nanosecond precision we probably can't drop it.

        Show
        mjacobs Matthew Jacobs added a comment - Ok, thanks. I also checked with Santosh (PM for datatypes) and it sounds like now that customers are using nanosecond precision we probably can't drop it.
        Hide
        mjacobs Matthew Jacobs added a comment -
        Show
        mjacobs Matthew Jacobs added a comment - ping Santosh Kumar cc Dan Hecht
        Hide
        dhecht Dan Hecht added a comment -

        I'm not sure we should treat Kudu timestamps as the same datatype as Impala's current TIMESTAMP type. We should also look at how Kudu timestamp lines up with the new parquet 64-bit timestamp type.

        Show
        dhecht Dan Hecht added a comment - I'm not sure we should treat Kudu timestamps as the same datatype as Impala's current TIMESTAMP type. We should also look at how Kudu timestamp lines up with the new parquet 64-bit timestamp type.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Dan Hecht I agree, that seems like the right approach if we're going to have a 64-bit timestamp type. Is that the case? From Santosh Kumar it sounds like we weren't going to do that, and instead just try to fix 96-bit timestamp type (I don't know what the details are there yet).

        Show
        mjacobs Matthew Jacobs added a comment - Dan Hecht I agree, that seems like the right approach if we're going to have a 64-bit timestamp type. Is that the case? From Santosh Kumar it sounds like we weren't going to do that, and instead just try to fix 96-bit timestamp type (I don't know what the details are there yet).
        Hide
        dhecht Dan Hecht added a comment -

        Eventually we will have to support the 64-bit version, but have no plans to do it in the immediate future.

        Show
        dhecht Dan Hecht added a comment - Eventually we will have to support the 64-bit version, but have no plans to do it in the immediate future.
        Hide
        mjacobs Matthew Jacobs added a comment -

        I updated the ticket to our current plan. Kudu's TIMESTAMP has been renamed UNIXTIME_MICROS to more accurately reflect what it is. Impala has a plan to expose this minimally for the next release, and it's a future work item for Kudu & Impala to support a real TIMESTAMP type that works for both.

        Show
        mjacobs Matthew Jacobs added a comment - I updated the ticket to our current plan. Kudu's TIMESTAMP has been renamed UNIXTIME_MICROS to more accurately reflect what it is. Impala has a plan to expose this minimally for the next release, and it's a future work item for Kudu & Impala to support a real TIMESTAMP type that works for both.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Implementing this as a workaround wouldn't be as easy as we thought because writing to these special BIGINT cols would need to know to write the type as UNIXTIME_MICROS instead of the type Impala thinks it is (BIGINT). We'd have to store extra col metadata for this to work, which seems pretty hacky and non-trivial work. I think this workaround just isn't going to be feasible. We should focus on a good timestamp solution for the next release.

        Show
        mjacobs Matthew Jacobs added a comment - Implementing this as a workaround wouldn't be as easy as we thought because writing to these special BIGINT cols would need to know to write the type as UNIXTIME_MICROS instead of the type Impala thinks it is (BIGINT). We'd have to store extra col metadata for this to work, which seems pretty hacky and non-trivial work. I think this workaround just isn't going to be feasible. We should focus on a good timestamp solution for the next release.
        Hide
        mjacobs Matthew Jacobs added a comment -

        we changed timestamp plans for Kudu, see IMPALA-5137

        Show
        mjacobs Matthew Jacobs added a comment - we changed timestamp plans for Kudu, see IMPALA-5137

          People

          • Assignee:
            Unassigned
            Reporter:
            caseyc casey
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development