Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4010

Auto adjust offset according to query server's timezone for time derived column

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • v3.0.0-alpha
    • v3.0.0
    • Others
    • None

    Description

      Important!!! This issue is re-implemented, following message in this page is out of date, please see this doc http://kylin.apache.org/docs/tutorial/lambda_mode_and_timezone_realtime_olap.html for the lastest information. (Update at 2020-01-19)

      Backgroud

      In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always error-prone, so we recommend to use a GMT+0 timestamp(System.currentTimeMillis()) to avoid such issue.

      I think this is good by design, it is easy to understand and always correct. But the side effect is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be shifted with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may compare query result from different data source and compare them and summarize, use GMT+0 may trouble them.

      Example

      For example, kylin user work in GMT+8 (maybe in Shanghai) want to know some metrics which occured from 2019-09-01 12:00:00 to 2019-09-01 14:00:00 in his local timezone, so he has to rewrite his query (with eight hour offset) to following:

      select hour_start, count(*)
      from realtime_table
      where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 06:00:00"  
      group by hour_start
      

      And he will get result like :

      hour_start count
      2019-09-01 04:00:00 139202
      2019-09-01 05:00:00 89398

      And he must convert to a more meaningful result in his mind, it is realy annoying!

      hour_start count
      2019-09-01 12:00:00 139202
      2019-09-01 13:00:00 89398

      Desgin

      We should not change the way receiver index event, event time should be stored in UTC timestamp. We should auto rewrite sql's event time related filter.

      In kylin, filter condition in where clause will be convert to a TupleFilter, and it looks like RelNode in Apache Calicate.
      For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 14:00:00", we will send TupleFilter to streaming receiver or region server which looks like this:

      AND
        GreatThanOrEqual
          hout_start
          CAST
            "2019-09-01 12:00:00"
            timestamp
        LessThanOrEqual
          hout_start
          CAST
            "2019-09-01 14:00:00"
            timestamp
      

      But for streaming query, we want to change each ConstantTupleFilter and minus value for that timestamp. So the TupleFilter which be sent will be following:

      AND
        GreatThanOrEqual
          hout_start
          CAST
            "2019-09-01 04:00:00"
            timestamp
        LessThanOrEqual
          hout_start
          CAST
            "2019-09-01 06:00:00"
            timestamp
      

      Before query result processed by OLAPEnumerator, kylin will plus each value of time derived column, thus protect row from be filtered by calcite generated code.

      So, user will get what he want in his timezone without any burden.

      How to use

      To enable auto shift by time zone, please set kylin.stream.auto.just.by.timezone to true.
      You can specific time zone by kylin.web.timezone, otherwise, time zone will be auto detected.
      Only time derived column will be affected.

      Related Issue

      Originally, the event time can only in the format of a long value (UTC timestamp). But in some case, the event time is in a format of "yyyy-MM-dd HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to convert such format into a UTC timestamp.

      Old Describe

      In Real-Time Streaming Cube when I send some records to kafka topic, the tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment named 20181231160000_20181231170000.

      Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName for class CubeSegment. I think that it should be config in kylin.properties.

      Attachments

        1. image-2019-07-15-17-15-31-209.png
          216 kB
          Xiaoxiang Yu
        2. image-2019-07-15-17-17-04-029.png
          320 kB
          Xiaoxiang Yu
        3. image-2019-07-15-17-17-39-568.png
          350 kB
          Xiaoxiang Yu
        4. image-2019-09-22-16-35-23-663.png
          257 kB
          Xiaoxiang Yu
        5. image-2019-09-22-16-43-19-248.png
          114 kB
          Xiaoxiang Yu
        6. image-2019-09-22-20-28-02-402.png
          142 kB
          Xiaoxiang Yu
        7. image-2019-09-22-20-35-19-038.png
          169 kB
          Xiaoxiang Yu
        8. image-2019-09-22-20-53-34-720.png
          349 kB
          Xiaoxiang Yu
        9. image-2019-09-24-17-23-42-451.png
          177 kB
          Yaqian Zhang
        10. image-2019-09-24-17-33-29-824.png
          218 kB
          Yaqian Zhang
        11. image-2019-09-24-17-35-47-589.png
          242 kB
          Yaqian Zhang
        12. image-2019-09-24-17-36-39-611.png
          178 kB
          Yaqian Zhang

        Issue Links

          Activity

            People

              hit_lacus Xiaoxiang Yu
              zengrui zengrui
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: