Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1513

Time partitioning doesn't work across multiple days

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v1.5.0
    • Fix Version/s: v1.5.1
    • Component/s: None
    • Labels:
      None

      Description

      I was attempting to use the new time partition functionality added in v1.5.0 (https://issues.apache.org/jira/browse/KYLIN-1427). I realize this hasn't been added to the interface yet (https://issues.apache.org/jira/browse/KYLIN-1441), so I'm not sure if this is feature is completely ready yet, but I thought I'd note the issue I ran into:

      • Using the API, I defined the `partition_time_column` and `partition_time_format` attributes on my model:
        {
          "partition_desc": {
            "partition_date_column": "DEFAULT.LOGS.CAL_DATE",
            "partition_date_format": "yyyy-MM-dd",
            "partition_date_column": "DEFAULT.LOGS.CAL_HOUR",
            "partition_date_format": "H",
            "partition_date_start": null,
            "partition_type": "APPEND"
          }
        }
            
      • I then attempted to build the cube for the first time over a multi-day duration (2010-08-16 to 2010-11-01).
      • The cube reported successfully building, but the resulting cube contained no data.

      When I looked at the Kylin logs, it looks like the SQL generated by the time partition won't work if the query spans multiple days. In my case, the WHERE clause used for 2010-08-16 to 2010-11-01 was:

      WHERE (LOGS.CAL_DATE >= '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01' AND LOGS.CAL_HOUR >= '0' AND LOGS.CAL_HOUR < '0')
      

      The issues seems to be that the hour condition is ANDed to the end of the existing date range query: https://github.com/apache/kylin/blob/kylin-1.5.0/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L170-L174 However, this logic doesn't work when the date range spans multiple days. In this case, it's trying to match where the hour column is both >= 0 and < 0 (since I was processing midnight to midnight), which will never match any results. However, if I switched my cube build to end at 02:00, I believe this would lead to some results being built in the cube, but not what's expected (it would only process 00:00-02:00 on each day.

      So while I think the current implementation will work as long as you're building less than 24 hours of data at a time, it would be nice if this could still support multiple-day builds when this time partition is also present.

      I think when a separate time column is present, the SQL generated would need to be something more like:

      WHERE
        (LOGS.CAL_DATE = '2010-08-16' AND LOGS.CAL_HOUR >= '0') OR
        (LOGS.CAL_DATE > '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01') OR
        (LOGS.CAL_DATE = '2010-11-01' AND LOGS.CAL_HOUR < '0')
      

        Activity

        Hide
        zhongjian Zhong,Jason added a comment -

        Hao Chen Hao, could you take a look at this?

        Show
        zhongjian Zhong,Jason added a comment - Hao Chen Hao, could you take a look at this?
        Hide
        haoch Hao Chen added a comment -

        Looking into it.

        Show
        haoch Hao Chen added a comment - Looking into it.
        Hide
        Shaofengshi Shaofeng SHI added a comment -

        Hao Chen have you started working on this? we plan to release v1.5.1; if you haven't I will take over this. Thanks.

        Show
        Shaofengshi Shaofeng SHI added a comment - Hao Chen have you started working on this? we plan to release v1.5.1; if you haven't I will take over this. Thanks.
        Hide
        Shaofengshi Shaofeng SHI added a comment -

        Push a commit for this issue: https://github.com/apache/kylin/commit/b37c5bb293fe2025bf7e7b0c4901d13c02fc86eb

        While I don't have such a case now, please help to review and verify Nick Muerdter

        Show
        Shaofengshi Shaofeng SHI added a comment - Push a commit for this issue: https://github.com/apache/kylin/commit/b37c5bb293fe2025bf7e7b0c4901d13c02fc86eb While I don't have such a case now, please help to review and verify Nick Muerdter
        Hide
        haoch Hao Chen added a comment -

        Shishaodong Sorry for the delayed response, thanks for the fixing.

        Show
        haoch Hao Chen added a comment - Shishaodong Sorry for the delayed response, thanks for the fixing.
        Hide
        lidong_sjtu Dong Li added a comment -

        Resolved in release 1.5.1 (2016-04-13)

        Show
        lidong_sjtu Dong Li added a comment - Resolved in release 1.5.1 (2016-04-13)

          People

          • Assignee:
            Shaofengshi Shaofeng SHI
            Reporter:
            nickm Nick Muerdter
            Request participants:
            None
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: