Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5864

CEP: fix duplicate output patterns problem.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: CEP
    • Labels:
      None

      Description

      Currently when searching for a pattern a,b,c and we have input elements a -> b1 -> b2 ->c where b1 and b2 are both valid elements for the position b, then instead of having an output of 2 matched patterns: a, b1, c and a, b2, c, we have 4, with 2 copies of each valid pattern.

      The problem is with the creation of Dewey number, cause it is not increased on graph branching.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dawidwys opened a pull request:

          https://github.com/apache/flink/pull/3390

          FLINK-5864 CEP: fix duplicate output patterns problem.

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [ ] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [ ] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dawidwys/flink flink5864

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3390.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3390


          commit e78a404175bd938f60030af26f27468f859f01c9
          Author: Dawid Wysakowicz <dawid@getindata.com>
          Date: 2017-02-22T12:31:43Z

          FLINK-5864 CEP: fix duplicate output patterns problem.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dawidwys opened a pull request: https://github.com/apache/flink/pull/3390 FLINK-5864 CEP: fix duplicate output patterns problem. Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [ ] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [ ] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/dawidwys/flink flink5864 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3390 commit e78a404175bd938f60030af26f27468f859f01c9 Author: Dawid Wysakowicz <dawid@getindata.com> Date: 2017-02-22T12:31:43Z FLINK-5864 CEP: fix duplicate output patterns problem.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3390#discussion_r103405219

          — Diff: flink-libraries/flink-cep/src/main/java/org/apache/flink/cep/nfa/NFA.java —
          @@ -254,7 +255,18 @@ public int hashCode() {
          states.push(stateTransition.getTargetState());
          break;
          case IGNORE:

          • resultingComputationStates.add(computationState);
            + final DeweyNumber version;
            + if (branched) {
              • End diff –

          Here we assume that the `IGNORE` is going to be processed after the `TAKE`. This is true because of the way the `NFACompiler` creates the pattern graph, but different implementations may change in the future so we should somehow impose this ordering. It could be by sorting the `stateTransitions` list in the `State` class, or by sorting them just before processing them in the `NFA::computeNextStates()`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/3390#discussion_r103405219 — Diff: flink-libraries/flink-cep/src/main/java/org/apache/flink/cep/nfa/NFA.java — @@ -254,7 +255,18 @@ public int hashCode() { states.push(stateTransition.getTargetState()); break; case IGNORE: resultingComputationStates.add(computationState); + final DeweyNumber version; + if (branched) { End diff – Here we assume that the `IGNORE` is going to be processed after the `TAKE`. This is true because of the way the `NFACompiler` creates the pattern graph, but different implementations may change in the future so we should somehow impose this ordering. It could be by sorting the `stateTransitions` list in the `State` class, or by sorting them just before processing them in the `NFA::computeNextStates()`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dawidwys commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3390#discussion_r103407276

          — Diff: flink-libraries/flink-cep/src/main/java/org/apache/flink/cep/nfa/NFA.java —
          @@ -254,7 +255,18 @@ public int hashCode() {
          states.push(stateTransition.getTargetState());
          break;
          case IGNORE:

          • resultingComputationStates.add(computationState);
            + final DeweyNumber version;
            + if (branched) {
              • End diff –

          Ok, will add the sorting in the State class.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dawidwys commented on a diff in the pull request: https://github.com/apache/flink/pull/3390#discussion_r103407276 — Diff: flink-libraries/flink-cep/src/main/java/org/apache/flink/cep/nfa/NFA.java — @@ -254,7 +255,18 @@ public int hashCode() { states.push(stateTransition.getTargetState()); break; case IGNORE: resultingComputationStates.add(computationState); + final DeweyNumber version; + if (branched) { End diff – Ok, will add the sorting in the State class.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dawidwys commented on the issue:

          https://github.com/apache/flink/pull/3390

          Just had a second thought, this solution is temporary anyway, so that the current Pattern graph is processed correctly. Before changing anything in `NFACompiler` we will need to change the `process` function in NFACompiler.

          We do not impose the first assumption, so not sure if we would benefit anything from imposing the second one.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dawidwys commented on the issue: https://github.com/apache/flink/pull/3390 Just had a second thought, this solution is temporary anyway, so that the current Pattern graph is processed correctly. Before changing anything in `NFACompiler` we will need to change the `process` function in NFACompiler. We do not impose the first assumption, so not sure if we would benefit anything from imposing the second one.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/3390

          I agree that the solution is temporary but the new version (with the richer pattern support) could take some time. As for the sorting, we could have a "pre-traversal" of the transitions, as you had in your previous version of this PR.

          It could be a separate method that copies the state transitions list and sorts it on the `action` field (make the `StateTransition` comparable) before the for-loop in the `computeNextStates`.

          This is not the most efficient solution, as it creates copies and sorts every time, but it is enough until we update the whole `NFA`/`NFACompiler`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/3390 I agree that the solution is temporary but the new version (with the richer pattern support) could take some time. As for the sorting, we could have a "pre-traversal" of the transitions, as you had in your previous version of this PR. It could be a separate method that copies the state transitions list and sorts it on the `action` field (make the `StateTransition` comparable) before the for-loop in the `computeNextStates`. This is not the most efficient solution, as it creates copies and sorts every time, but it is enough until we update the whole `NFA`/`NFACompiler`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/3390

          Thanks @dawidwys for the work! Merging this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/3390 Thanks @dawidwys for the work! Merging this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/3390

          Hi @dawidwys I merged this. Can you close this PR and the related JIRA?

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/3390 Hi @dawidwys I merged this. Can you close this PR and the related JIRA?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dawidwys closed the pull request at:

          https://github.com/apache/flink/pull/3390

          Show
          githubbot ASF GitHub Bot added a comment - Github user dawidwys closed the pull request at: https://github.com/apache/flink/pull/3390

            People

            • Assignee:
              dawidwys Dawid Wysakowicz
              Reporter:
              kkl0u Kostas Kloudas
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development