Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1911

Support WITHIN clause in MATCH_RECOGNIZE

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.14.0
    • Component/s: None
    • Labels:

      Description

      Window is an important feature for pattern detection, it defines the time duration for the events to match a pattern. Here is an example from doc:

      SELECT T.Ac2, T.Bc2, T.Cc2 
          FROM S
          MATCH_RECOGNIZE(
              MEASURES A.c2 as Ac2, B.c2 as Bc2, C.c2 as Cc2
              PATTERN (A (B+ | C)) within 3000 milliseconds 
              DEFINE 
                  A as A.c1=10 or A.c1=25, 
                  B as B.c1=20 or B.c1=15 or B.c1=25, 
                  C as C.c1=15
          ) as T
      

        Activity

        Hide
        michaelmior Michael Mior added a comment -

        Resolved in release 1.14.0 (2017-10-01)

        Show
        michaelmior Michael Mior added a comment - Resolved in release 1.14.0 (2017-10-01)
        Hide
        julianhyde Julian Hyde added a comment -

        Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/dfe251d7; thanks for the PR, Dian Fu! (You'll notice that I simplified the logic to convert an interval RexLiteral to a SqlLiteral, and also changed the order of the rowType parameter in the Match constructor.)

        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/dfe251d7 ; thanks for the PR, Dian Fu ! (You'll notice that I simplified the logic to convert an interval RexLiteral to a SqlLiteral, and also changed the order of the rowType parameter in the Match constructor.)
        Hide
        julianhyde Julian Hyde added a comment -

        Reviewing/testing now.

        Show
        julianhyde Julian Hyde added a comment - Reviewing/testing now.
        Hide
        dian.fu Dian Fu added a comment -

        Hi Julian Hyde, I have addressed your comments in the latest PR and could you help to take a look? Very appreciated.

        Show
        dian.fu Dian Fu added a comment - Hi Julian Hyde , I have addressed your comments in the latest PR and could you help to take a look? Very appreciated.
        Hide
        dian.fu Dian Fu added a comment -

        Make sense. Have updated the tests for INTERVAL.

        Show
        dian.fu Dian Fu added a comment - Make sense. Have updated the tests for INTERVAL.
        Hide
        julianhyde Julian Hyde added a comment -

        You've addressed almost everything. I was pleased to see negative tests for ORDER BY.

        Can you make the unit tests for intervals much smaller? The only difference between testMatchRecognizeWithin6 and testMatchRecognizeWithin7 is the kind of interval. That's the purpose of those tests, after all. You know that, I know that, but whoever maintains this code in the future won't be able to see the wood for the trees. Don't write a 50 line test when a 1 line test will do. As I said earlier, you can test using a 1-line VALUES statement.

        Show
        julianhyde Julian Hyde added a comment - You've addressed almost everything. I was pleased to see negative tests for ORDER BY. Can you make the unit tests for intervals much smaller? The only difference between testMatchRecognizeWithin6 and testMatchRecognizeWithin7 is the kind of interval. That's the purpose of those tests, after all. You know that, I know that, but whoever maintains this code in the future won't be able to see the wood for the trees. Don't write a 50 line test when a 1 line test will do. As I said earlier, you can test using a 1-line VALUES statement.
        Hide
        dian.fu Dian Fu added a comment - - edited

        Have updated the PR according to review comments.
        Have also added the following method in LogicalMatch. Although this is not related to this JIRA, but the change is very small and so just fix it here.

        public RelNode accept(RelShuttle shuttle) {
            return shuttle.visit(this);
        }
        
        Show
        dian.fu Dian Fu added a comment - - edited Have updated the PR according to review comments. Have also added the following method in LogicalMatch . Although this is not related to this JIRA, but the change is very small and so just fix it here. public RelNode accept(RelShuttle shuttle) { return shuttle.visit( this ); }
        Hide
        julianhyde Julian Hyde added a comment -

        I think it's just a matter of generating the interval string when you create a SqlIntervalLiteral. I think you can write a for loop from the start time unit to the end time unit, and putting TimeUnit.separator between them.

        Can you add a unit test to RelToSqlConverterTest that checks that RexLiteral to SqlIntervalLiteral conversion works for "VALUES INTERVAL '1-2' YEAR TO MONTH" and a few other kinds of intervals.

        Show
        julianhyde Julian Hyde added a comment - I think it's just a matter of generating the interval string when you create a SqlIntervalLiteral. I think you can write a for loop from the start time unit to the end time unit, and putting TimeUnit.separator between them. Can you add a unit test to RelToSqlConverterTest that checks that RexLiteral to SqlIntervalLiteral conversion works for "VALUES INTERVAL '1-2' YEAR TO MONTH" and a few other kinds of intervals.
        Hide
        dian.fu Dian Fu added a comment -

        Julian Hyde, Zhiqiang He Thanks a lot for your review.

        I am surprised that you needed to add so much code for intervals in SqlParserUtil, and indeed in reference.md. Were you not able to re-use existing interval literal support?

        I have tried to reuse existing code, but didn't found any existing code to transform from RexLiteral of type INTERVAL_YEAR_MONTH or INTERVAL_DAY_TIME to SqlIntervalLiteral. I will investigate how this is done for the existing literal support.

        Show
        dian.fu Dian Fu added a comment - Julian Hyde , Zhiqiang He Thanks a lot for your review. I am surprised that you needed to add so much code for intervals in SqlParserUtil, and indeed in reference.md. Were you not able to re-use existing interval literal support? I have tried to reuse existing code, but didn't found any existing code to transform from RexLiteral of type INTERVAL_YEAR_MONTH or INTERVAL_DAY_TIME to SqlIntervalLiteral . I will investigate how this is done for the existing literal support.
        Hide
        ransom Zhiqiang He added a comment -

        SqlValidatorImpl.java shoud add check orderBy clause when within is used.
        we need a event time or process time to define the range of MATCH_RECOGNIZE. l just like a stream window.

        Show
        ransom Zhiqiang He added a comment - SqlValidatorImpl.java shoud add check orderBy clause when within is used. we need a event time or process time to define the range of MATCH_RECOGNIZE. l just like a stream window.
        Hide
        julianhyde Julian Hyde added a comment -

        A few review comments:

        • Add comments in Match etc. that interval can be null. Not-null is usually assumed.
        • Interval 0 is ok, right? So the message should say "Interval must be non-negative".
        • I don't think you need SqlIntervalQualifier.CONV. org.apache.calcite.avatica.util.TimeUnit.multiplier is probably sufficient.
        • In Match.explainTerms, if should ignore null interval, not print "interval: null".
        • I am surprised that you needed to add so much code for intervals in SqlParserUtil, and indeed in reference.md. Were you not able to re-use existing interval literal support?
        • As I said earlier, we need a way to specify ordering. Maybe there is a rowtime column on streams. But this clause also needs to work on tables, which definitely don't have a rowtime column.
        Show
        julianhyde Julian Hyde added a comment - A few review comments: Add comments in Match etc. that interval can be null. Not-null is usually assumed. Interval 0 is ok, right? So the message should say "Interval must be non-negative". I don't think you need SqlIntervalQualifier.CONV. org.apache.calcite.avatica.util.TimeUnit.multiplier is probably sufficient. In Match.explainTerms, if should ignore null interval, not print "interval: null". I am surprised that you needed to add so much code for intervals in SqlParserUtil, and indeed in reference.md. Were you not able to re-use existing interval literal support? As I said earlier, we need a way to specify ordering. Maybe there is a rowtime column on streams. But this clause also needs to work on tables, which definitely don't have a rowtime column.
        Hide
        dian.fu Dian Fu added a comment -

        Julian Hyde Zhiqiang He, Have create a PR. Could you help to review? Thanks a lot.
        https://github.com/apache/calcite/pull/509

        Show
        dian.fu Dian Fu added a comment - Julian Hyde Zhiqiang He , Have create a PR. Could you help to review? Thanks a lot. https://github.com/apache/calcite/pull/509
        Hide
        dian.fu Dian Fu added a comment -

        Right, I think the rowtime column can be seen as the timestamp column.

        Show
        dian.fu Dian Fu added a comment - Right, I think the rowtime column can be seen as the timestamp column.
        Hide
        julianhyde Julian Hyde added a comment -

        Here's a possible solution. If people want to use WITHIN, they have to define an ordering. So we add an optional ORDER BY sub-clause to MATCH_RECOGNIZE:

        SELECT T.Ac2, T.Bc2, T.Cc2 
            FROM S
            MATCH_RECOGNIZE(
                ORDER BY rowtime
                MEASURES A.c2 as Ac2, B.c2 as Bc2, C.c2 as Cc2
                PATTERN (A (B+ | C)) WITHIN INTERVAL '3' SECONDS
                DEFINE 
                    A as A.c1=10 or A.c1=25, 
                    B as B.c1=20 or B.c1=15 or B.c1=25, 
                    C as C.c1=15
            ) as T
        

        This is analogous to how you have to specify ORDER BY inside an OVER clause if you want to use use RANGE, e.g.:

        SELECT ticker, COUNT(*) OVER (
            PARTITION BY ticker
            ORDER BY rowtime
            RANGE INTERVAL '10' SECONDS PRECEDING)
        FROM Trades
        
        Show
        julianhyde Julian Hyde added a comment - Here's a possible solution. If people want to use WITHIN, they have to define an ordering. So we add an optional ORDER BY sub-clause to MATCH_RECOGNIZE : SELECT T.Ac2, T.Bc2, T.Cc2 FROM S MATCH_RECOGNIZE( ORDER BY rowtime MEASURES A.c2 as Ac2, B.c2 as Bc2, C.c2 as Cc2 PATTERN (A (B+ | C)) WITHIN INTERVAL '3' SECONDS DEFINE A as A.c1=10 or A.c1=25, B as B.c1=20 or B.c1=15 or B.c1=25, C as C.c1=15 ) as T This is analogous to how you have to specify ORDER BY inside an OVER clause if you want to use use RANGE , e.g.: SELECT ticker, COUNT(*) OVER ( PARTITION BY ticker ORDER BY rowtime RANGE INTERVAL '10' SECONDS PRECEDING) FROM Trades
        Hide
        dian.fu Dian Fu added a comment - - edited

        Correct, this is not in Oracle 12 and also not in the SQL standard, but I think this feature is very useful as window is very important for pattern detection.

        It assumes that there is a system column that defines the timestamp. Only possible in streaming queries, and even then, probably not well defined.

        Yes, that's right. For cases where there is no timestamp column, users don't need to define this clause and so there is no side effect.

        We should use the SQL standard notation for intervals: WITHIN INTERVAL '3' SECONDS rather than within 3000 milliseconds

        Make sense to me.

        Show
        dian.fu Dian Fu added a comment - - edited Correct, this is not in Oracle 12 and also not in the SQL standard, but I think this feature is very useful as window is very important for pattern detection. It assumes that there is a system column that defines the timestamp. Only possible in streaming queries, and even then, probably not well defined. Yes, that's right. For cases where there is no timestamp column, users don't need to define this clause and so there is no side effect. We should use the SQL standard notation for intervals: WITHIN INTERVAL '3' SECONDS rather than within 3000 milliseconds Make sense to me.
        Hide
        julianhyde Julian Hyde added a comment - - edited

        To be clear, this is not in Oracle 12, and it is not in the SQL standard. This is in Oracle fusion middleware's CQL. So, I am not yet convinced we should add this in the current form.

        A couple of immediate problems:

        • It assumes that there is a system column that defines the timestamp. Only possible in streaming queries, and even then, probably not well defined.
        • We should use the SQL standard notation for intervals: WITHIN INTERVAL '3' SECONDS rather than within 3000 milliseconds

        Cc Zhiqiang He

        Show
        julianhyde Julian Hyde added a comment - - edited To be clear, this is not in Oracle 12, and it is not in the SQL standard. This is in Oracle fusion middleware's CQL. So, I am not yet convinced we should add this in the current form. A couple of immediate problems: It assumes that there is a system column that defines the timestamp. Only possible in streaming queries, and even then, probably not well defined. We should use the SQL standard notation for intervals: WITHIN INTERVAL '3' SECONDS rather than within 3000 milliseconds Cc Zhiqiang He

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            dian.fu Dian Fu
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development