Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8180

Change Kudu timestamp writer to round towards minus infinity

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 3.1.0
    • None
    • Backend
    • None
    • ghx-label-3

    Description

      Kudu timestamps are microseconds since Unix epoch stored as int64, so Impala has to round its nanosecond timestamps before writing them to Kudu tables. Currently this is done by rounding to the nearest microsecond. Meanwhile Hive uses rounding towards minus infinity when reducing the precision of timestamps, which is a better way in my opinion, because it cannot move a timestamp into a different day, and should be also a bit faster.

      Changing the rounding method is breaking change, so I would only do this in the next major release.

      Example:
      create table tkudu (id int primary key, t timestamp) stored as kudu;
      insert into tkudu values
      (1,"1970-01-01 00:00:00.1111111"), – all sub-second parts are 7 digit
      (2,"1970-01-01 23:59:59.9999999"),
      (3,"1969-12-31 23:59:59.9999999");
      select * from tkudu;

      This currently returns:
      1,1970-01-01 00:00:00.111111000
      2,1970-01-02 00:00:00
      3,1970-01-01 00:00:00

      1 was rounded down to microsec precision, while 2 and 3 were rounded up and also stepped to another day.

      If the table was written using rounding toward minus infinity, then the query would return this:
      1,1970-01-01 00:00:00.111111000
      2,1970-01-01 23:59:59.999999000
      3,1969-12-31 23:59:59.999999000

      Attachments

        Activity

          People

            Unassigned Unassigned
            csringhofer Csaba Ringhofer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: