Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: 1.11.0
    • Fix Version/s: 1.13.0
    • Component/s: core
    • Labels:

      Description

      MATCH_RECOGNIZE syntax like this:https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8980
      Only pattern and define is supported in first step.

      PATTERN: Defining the Row Pattern to Be Matched

      The PATTERN keyword specifies the pattern to be recognized in the ordered sequence of rows in a partition. Each variable name in a pattern corresponds to a Boolean condition, which is specified later using the DEFINE component of the syntax.

      The PATTERN clause is used to specify a regular expression. It is outside the scope of this material to explain regular expression concepts and details. If you are not familiar with regular expressions, you are encouraged to familiarize yourself with the topic using other sources.

      The regular expression in a PATTERN clause is enclosed in parentheses. PATTERN may use the following operators:

      • Concatenation

      Concatenation is used to list two or more items in a pattern to be matched in that order. Items are concatenated when there is no operator sign between two successive items. For example: PATTERN (A B C).

      • Quantifiers

      Quantifiers define the number of iterations accepted for a match. Quantifiers are postfix operators with the following choices:

        • * — 0 or more iterations
        • + — 1 or more iterations
        • ? — 0 or 1 iterations
        • {n}

          — n iterations (n > 0)

        • {n,} — n or more iterations (n >= 0)
          ** {n,m} — between n and m (inclusive) iterations (0 <= n <= m, 0 < m)
          ** {,m} — between 0 and m (inclusive) iterations (m > 0)
          ** reluctant quantifiers — indicated by an additional question mark following a quantifier (*?, +?, ??, {n,}

          ?,

          { n, m }

          ?,

          {,m}

          ?). See "Reluctant Versus Greedy Quantifier" for the difference between reluctant and non-reluctant quantifiers.

      The following are examples of using quantifier operators:

        • A* matches 0 or more iterations of A
        • A {3,6}

          matches 3 to 6 iterations of A

        • A {,4}

          matches 0 to 4 iterations of A

      • Alternation
        Alternation matches a single regular expression from a list of several possible regular expressions. The alternation list is created by placing a vertical bar (|) between each regular expression. Alternatives are preferred in the order they are specified. As an example, PATTERN (A | B | C) attempts to match A first. If A is not matched, it attempts to match B. If B is not matched, it attempts to match C.
      • Grouping
        Grouping treats a portion of the regular expression as a single unit, enabling you to apply regular expression operators such as quantifiers to that group. Grouping is created with parentheses. As an example, PATTERN ((A B) {3}

        C) attempts to match the group (A B) three times and then seeks one occurrence of C.

      • PERMUTE
        See "How to Express All Permutations" for more information.
      • Exclusion
        Parts of the pattern to be excluded from the output of ALL ROWS PER MATCH are enclosed between {- and -}. See "How to Exclude Portions of the Pattern from the Output".
      • Anchors
        Anchors work in terms of positions rather than rows. They match a position either at the start or end of a partition.
        • ^ matches the position before the first row in the partition.
        • $ matches the position after the last row in the partition.
          As an example, PATTERN (^A+$) will match only if all rows in a partition satisfy the condition for A. The resulting match spans the entire partition.
      • Empty pattern (), matches an empty set of rows
        This section contains the following topics:
        Reluctant Versus Greedy Quantifier
        Operator Precedence

        Reluctant Versus Greedy Quantifier

        Pattern quantifiers are referred to as greedy; they will attempt to match as many instances of the regular expression on which they are applied as possible. The exception is pattern quantifiers that have a question mark ? as a suffix, and those are referred to as reluctant. They will attempt to match as few instances as possible of the regular expression on which they are applied.

      The difference between greedy and reluctant quantifiers appended to a single pattern variable is illustrated as follows: A* tries to map as many rows as possible to A, whereas A*? tries to map as few rows as possible to A. For example:

      PATTERN (X Y* Z)
      

      The pattern consists of three variable names, X, Y, and Z, with Y quantified with *. This means a pattern match will be recognized and reported when the following condition is met by consecutive incoming input rows:

        • A row satisfies the condition that defines variable X followed by zero or more rows that satisfy the condition that defines the variable Y followed by a row that satisfies the condition that defines the variable Z.

      During the pattern matching process, after a row was mapped to X and 0 or more rows were mapped to Y, if the following row can be mapped to both variables Y and Z (which satisfies the defining condition of both Y and Z), then, because the quantifier * for Y is greedy, the row is preferentially mapped to Y instead of Z. Due to this greedy property, Y gets preference over Z and a greater number of rows to Y are mapped. If the pattern expression was PATTERN (X Y*? Z), which uses a reluctant quantifier *? over Y, then Z gets preference over Y.

      Operator Precedence

      The precedence of the elements in a regular expression, in decreasing order, is as follows:

      • row_pattern_primary
        These elements include primary pattern variables (pattern variables not created with the SUBSET clause described in "SUBSET: Defining Union Row Pattern Variables"), anchors, PERMUTE, parenthetic expressions, exclusion syntax, and empty pattern
      • Quantifier
        A row_pattern_primary may have zero or one quantifier.
      • Concatenation
      • Alternation
        Precedence of alternation is illustrated by PATTERN(A B | C D), which is equivalent to PATTERN ((A B) | (C D)). It is not, however, equivalent to PATTERN (A (B | C) D).
        Precedence of quantifiers is illustrated by PATTERN (A B ), which is equivalent to PATTERN (A (B)). It is not, however, PATTERN ((A B)*).
        A quantifier may not immediately follow another quantifier. For example, PATTERN(A**) is prohibited.
        It is permitted for a primary pattern variable to occur more than once in a pattern, for example, PATTERN (X Y X).

      *DEFINE*: Defining Primary Pattern Variables

      DEFINE is a mandatory clause, used to specify the conditions that define primary pattern variables. In the example:

      DEFINE UP AS UP.Price > PREV(UP.Price),
      DOWN AS DOWN.Price < PREV(DOWN.Price)
      

      UP is defined by the condition UP.Price > PREV (UP.Price), and DOWN is defined by the condition DOWN.Price < PREV (DOWN.Price). (PREV is a row pattern navigation operation which evaluates an expression in the previous row; see "Row Pattern Navigation Operations" regarding the complete set of row pattern navigation operations.)

      A pattern variable does not require a definition; if there is no definition, any row can be mapped to the pattern variable.

      A union row pattern variable (see discussion of SUBSET in "SUBSET: Defining Union Row Pattern Variables") cannot be defined by DEFINE, but can be referenced in the definition of a pattern variable.

      The definition of a pattern variable can reference another pattern variable, which is illustrated in Example 20-6.

      Example 20-6 Defining Pattern Variables

      SELECT *
      FROM Ticker MATCH_RECOGNIZE (
           PARTITION BY Symbol
           FROM Ticker
           MATCH_RECOGNIZE (
           PARTITION BY Symbol
           ORDER BY tstamp
           MEASURES FIRST (A.tstamp) AS A_Firstday,
                    LAST (D.tstamp) AS D_Lastday,
                    AVG (B.Price) AS B_Avgprice,
                    AVG (D.Price) AS D_Avgprice
           PATTERN (A B+ C+ D)
           SUBSET BC = (B,C)
           DEFINE A AS Price > 100,
                  B AS B.Price > A.Price,
                  C AS C.Price < AVG (B.Price),
                  D AS D.Price > MAX (BC.Price)
      ) M
      

      In this example:
      The definition of A implicitly references the universal row pattern variable (because of the unqualified column reference Price).
      The definition of B references the pattern variable A.
      The definition of C references the pattern variable B.
      The definition of D references the union row pattern variable BC.
      The conditions are evaluated on successive rows of a partition in a trial match, with the current row being tentatively mapped to a pattern variable as permitted by the pattern. To be successfully mapped, the condition must evaluate to true.
      In the previous example:

      A AS Price > 100
      

      Price refers to the Price in the current row, because the last row mapped to any primary row pattern variable is the current row, which is tentatively mapped to A. Alternatively, in this example, using A.Price would have led to the same results.

      B AS B.Price > A.Price
      

      B.Price refers to the Price in the current row (because B is being defined), whereas A.Price refers to the last row mapped to A. In view of the pattern, the only row mapped to A is the first row to be mapped.

      C AS C.Price < AVG(B.Price)
      

      Here C.Price refers to the Price in the current row, because C is being defined. The aggregate AVG (that is, insert Price) is computed as the average of all rows that are already mapped to B.

      D AS D.Price > MAX(BC.Price)
      

      The pattern variable D is similar to pattern variable C, though it illustrates the use of a union row pattern variable in the Boolean condition. In this case, MAX(BC.Price) returns the maximum price value of the rows matched to variable B or variable C. The semantics of Boolean conditions are discussed in more detail in "Expressions in MEASURES and DEFINE".

      MATCH_RECOGNIZE syntax :

      table_reference ::=
        {only (query_table_expression) | query_table_expression }[flashback_query_clause]
         [pivot_clause|unpivot_clause|row_pattern_recognition_clause] [t_alias]
      
      row_pattern_recognition_clause ::=
        MATCH_RECOGNIZE (
         PATTERN (row_pattern)
         DEFINE row_pattern_definition_list
         )
      
      row_pattern ::=
         row_pattern_term
        | row_pattern "|" row_pattern_term
      
      row_pattern_term ::=
         row_pattern_factor
        | row_pattern_term row_pattern_factor
      
      row_pattern_factor ::=
         row_pattern_primary [row_pattern_quantifier]
      
      row_pattern_quantifier ::=
          *[?]
         |+[?]
         |?[?]
         |"{"[unsigned_integer ],[unsigned_integer]"}"[?]
         |"{"unsigned_integer "}"
      
      row_pattern_primary ::=
         variable_name
         |$
         |^
         |([row_pattern])
         |"{-" row_pattern"-}"
         | row_pattern_permute
      
      row_pattern_permute ::=
         PERMUTE (row_pattern [, row_pattern] ...)
      
      row_pattern_subset_clause ::=
         SUBSET row_pattern_subset_item [, row_pattern_subset_item] ...
      
      row_pattern_subset_item ::=
         variable_name = (variable_name[ , variable_name]...)
      
      row_pattern_definition_list ::=
         row_pattern_definition[, row_pattern_definition]...
      
      row_pattern_definition ::=
         variable_name AS condition
      

        Issue Links

          Activity

          Hide
          ransom Zhiqiang He added a comment -

          https://github.com/apache/calcite/pull/378
          please review it. test case will add later.

          Show
          ransom Zhiqiang He added a comment - https://github.com/apache/calcite/pull/378 please review it. test case will add later.
          Hide
          julianhyde Julian Hyde added a comment -

          Very nice work! The way you have structured the code looks right.

          I know you intend to add test cases later, but I would add a few tests to SqlParserTest and SqlValidatorTest fairly soon. Be sure to include negative tests (e.g. references to invalid columns).

          There is a chance that the operators you have added to SqlStdOperatorTable (e.g. CLASSIFIER, "|") will be available in regular SQL, and we don't want that. Please add tests to make sure.

          Double-check your spelling and grammar. E.g. "the alternation operator in a pattern expression whtin a match_recognize clasue" should be "The alternation operator in a pattern expression within a match_recognize clause." Also SqlKind.RUNNINIG.

          Also add a section to reference.md for the syntax you have implemented. Mark it "not fully implemented" but having it in the doc will stimulate discussion and review.

          Once those tests and doc are written we will have high confidence that this works, and we can commit as an experimental/incomplete feature. This will be better for you, because you won't be working on a branch.

          Show
          julianhyde Julian Hyde added a comment - Very nice work! The way you have structured the code looks right. I know you intend to add test cases later, but I would add a few tests to SqlParserTest and SqlValidatorTest fairly soon. Be sure to include negative tests (e.g. references to invalid columns). There is a chance that the operators you have added to SqlStdOperatorTable (e.g. CLASSIFIER, "|") will be available in regular SQL, and we don't want that. Please add tests to make sure. Double-check your spelling and grammar. E.g. "the alternation operator in a pattern expression whtin a match_recognize clasue" should be "The alternation operator in a pattern expression within a match_recognize clause." Also SqlKind.RUNNINIG. Also add a section to reference.md for the syntax you have implemented. Mark it "not fully implemented" but having it in the doc will stimulate discussion and review. Once those tests and doc are written we will have high confidence that this works, and we can commit as an experimental/incomplete feature. This will be better for you, because you won't be working on a branch.
          Hide
          ransom Zhiqiang He added a comment -

          Julian Hyde thands for your review.
          the spelling error is already removed . and CLASSIFIER and other not used operator in SqlStdOperatorTable is already removed.
          I add syntax define in reference.md, I want to add more details , can we copy documents in Oracle to calcite reference.md?

          Show
          ransom Zhiqiang He added a comment - Julian Hyde thands for your review. the spelling error is already removed . and CLASSIFIER and other not used operator in SqlStdOperatorTable is already removed. I add syntax define in reference.md, I want to add more details , can we copy documents in Oracle to calcite reference.md?
          Hide
          julianhyde Julian Hyde added a comment -

          The Oracle documentation is copyright. I don't think you can copy it. You need to describe the API in your own words.

          Show
          julianhyde Julian Hyde added a comment - The Oracle documentation is copyright. I don't think you can copy it. You need to describe the API in your own words.
          Hide
          julianhyde Julian Hyde added a comment -

          Zhiqiang He, Please let me know when you think the PR is ready to commit, and I will review again. It can be "experimental" status, but must be accompanied by tests and draft documentation.

          Show
          julianhyde Julian Hyde added a comment - Zhiqiang He , Please let me know when you think the PR is ready to commit, and I will review again. It can be "experimental" status, but must be accompanied by tests and draft documentation.
          Hide
          ransom Zhiqiang He added a comment -

          Julian Hyde I am writing test case now ,after test case commit , I will tell you to test it and commit. thanks.

          Show
          ransom Zhiqiang He added a comment - Julian Hyde I am writing test case now ,after test case commit , I will tell you to test it and commit. thanks.
          Hide
          ransom Zhiqiang He added a comment -

          Julian Hyde I an already commit code in the merge request.
          I add some test case and code to support reltosql.
          you can merge it after review.
          or you can merge it after 1.12 release. thanks.

          Show
          ransom Zhiqiang He added a comment - Julian Hyde I an already commit code in the merge request. I add some test case and code to support reltosql. you can merge it after review. or you can merge it after 1.12 release. thanks.
          Hide
          julianhyde Julian Hyde added a comment -

          I have rebased your changes onto the latest, done some fix up (mainly cosmetic) and pushed to https://github.com/julianhyde/calcite/tree/1641-match. Here are the things I think still need to be done:

          • In validator, don't use "Preconditions.checkArgument" for user
            errors. Use "newValidatorError(..., RESOURCE.messageName)". Add at
            least one test for each error, and make sure that the error occurs
            at the right position in the query text.
          • Add javadoc comments for the functions added to SqlStdOperatorTable
          • Move PatternValidator and other inner classes further up
            SqlValidatorImpl.java, out of the "enums" section
          • Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being seen in regular SQL)
          • Should we rename MatchRecognize and LogicalMatchRecognize to Match?
          • In Parser.jj, move the rules further down the file
          • In reference.md, describe measureColumn and condition
          • Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors

          It seems like a long list, but I think we're close. Your work is very high quality.

          Do later:

          • Use session's case-sensitivity for matching names
          Show
          julianhyde Julian Hyde added a comment - I have rebased your changes onto the latest, done some fix up (mainly cosmetic) and pushed to https://github.com/julianhyde/calcite/tree/1641-match . Here are the things I think still need to be done: In validator, don't use "Preconditions.checkArgument" for user errors. Use "newValidatorError(..., RESOURCE.messageName)". Add at least one test for each error, and make sure that the error occurs at the right position in the query text. Add javadoc comments for the functions added to SqlStdOperatorTable Move PatternValidator and other inner classes further up SqlValidatorImpl.java, out of the "enums" section Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being seen in regular SQL) Should we rename MatchRecognize and LogicalMatchRecognize to Match? In Parser.jj, move the rules further down the file In reference.md, describe measureColumn and condition Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors It seems like a long list, but I think we're close. Your work is very high quality. Do later: Use session's case-sensitivity for matching names
          Hide
          ransom Zhiqiang He added a comment - - edited

          Julian Hyde I add a commit for this request. please review it.
          1,2,3,5,6 already finished.
          4:Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being seen in regular SQL)
          I am not understand your mean . I add some test case for FIRST in sqlvalidatorTest.
          you can give me more details is this test case it not right.
          7.In reference.md, describe measureColumn and condition
          I already add measure column define in refernce.md. and more details infomation or match recognize I will write it after all functions finished.
          8:Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors
          I am already execute mvn sit in JDK 7 , but not found some error. can you give me some details or example?

          Show
          ransom Zhiqiang He added a comment - - edited Julian Hyde I add a commit for this request. please review it. 1,2,3,5,6 already finished. 4:Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being seen in regular SQL) I am not understand your mean . I add some test case for FIRST in sqlvalidatorTest. you can give me more details is this test case it not right. 7.In reference.md, describe measureColumn and condition I already add measure column define in refernce.md. and more details infomation or match recognize I will write it after all functions finished. 8:Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors I am already execute mvn sit in JDK 7 , but not found some error. can you give me some details or example?
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/4e103825. Thanks for the PR, Zhiqiang He!

          I disabled some of your tests: see the TODO in SqlValidatorTest.testMatchRecognizeInternals. Please fix those tests in your next iteration. You might need to move some of your functions out of SqlStdOperatorTable.

          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/4e103825 . Thanks for the PR, Zhiqiang He ! I disabled some of your tests: see the TODO in SqlValidatorTest.testMatchRecognizeInternals . Please fix those tests in your next iteration. You might need to move some of your functions out of SqlStdOperatorTable.
          Hide
          ransom Zhiqiang He added a comment -

          Julian Hyde I got it. I will fix it in another issue.
          your can close this issue now. thanks.

          Show
          ransom Zhiqiang He added a comment - Julian Hyde I got it. I will fix it in another issue. your can close this issue now. thanks.
          Hide
          ransom Zhiqiang He added a comment - - edited

          I'm create a new issues CALCITE-1686 to fix the test case issues. this issue can be closed now.

          Show
          ransom Zhiqiang He added a comment - - edited I'm create a new issues CALCITE-1686 to fix the test case issues. this issue can be closed now.

            People

            • Assignee:
              ransom Zhiqiang He
              Reporter:
              ransom Zhiqiang He
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development