Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5380

Number of outgoing records not reported in web interface

    Details

      Description

      The web frontend does not report any outgoing records in the web frontend.
      The amount of data in MB is reported correctly.

        Issue Links

          Activity

          Hide
          Zentol Chesnay Schepler added a comment -

          master: cb05915759b1d5ea4dbfcdd3ff76dcfd9cebe601
          1.2: 792f7e45216377fa1d6f29dfc767d83cf1a84f37

          Show
          Zentol Chesnay Schepler added a comment - master: cb05915759b1d5ea4dbfcdd3ff76dcfd9cebe601 1.2: 792f7e45216377fa1d6f29dfc767d83cf1a84f37
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3068

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3068
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3068

          Yes, its some fancy multi-chaining of the streaming API.

          Then lets merge it as is

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3068 Yes, its some fancy multi-chaining of the streaming API. Then lets merge it as is
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3068

          So that's like...a side output? I didn't even know that was possible.

          Until we can display operators in the web-interface, and by extension in the ExecutionGraph, i don't see a way to change this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3068 So that's like...a side output? I didn't even know that was possible. Until we can display operators in the web-interface, and by extension in the ExecutionGraph, i don't see a way to change this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3068

          Mh, I just realized that the streaming APIs multichaining is leading to some ugly side effects.

          In below screenshot, you see that the number of outgoing records is reported as 247. These records are the number of elements emitted by the filter.
          The task below the source is the window that is consuming the data from the filter.
          Now the next two tasks also consume data from the same task, but without the filter. You can see that they've consumed much more data (because its unfiltered).

          ![image](https://cloud.githubusercontent.com/assets/89049/21997216/58c989a2-dc2e-11e6-9065-214dffcc6bbc.png)

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3068 Mh, I just realized that the streaming APIs multichaining is leading to some ugly side effects. In below screenshot, you see that the number of outgoing records is reported as 247. These records are the number of elements emitted by the filter. The task below the source is the window that is consuming the data from the filter. Now the next two tasks also consume data from the same task, but without the filter. You can see that they've consumed much more data (because its unfiltered). ! [image] ( https://cloud.githubusercontent.com/assets/89049/21997216/58c989a2-dc2e-11e6-9065-214dffcc6bbc.png )
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3068

          merging.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3068 merging.
          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3068 +1 to merge ! [image] ( https://cloud.githubusercontent.com/assets/89049/21963634/948c5988-db3e-11e6-8641-6089521e9d87.png )
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user zentol opened a pull request:

          https://github.com/apache/flink/pull/3068

          FLINK-5380 Fix task metrics reuse for single-operator chains

          This PR fixes the chain end detection in the `StreamingJobGraphGenerator`, which fixes the re-use of metrics for tasks consisting of only a single operator, which fixes the display in the WebInterface.

          I've also added a test to make sure that chain start and end are set correctly. Before the fix, `assertTrue(sourceConfig.isChainEnd());` would have failed.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/zentol/flink 5380_web_metrics

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3068.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3068


          commit bfd038d13b73816adb4dab1387b1f75ecf2f2bfc
          Author: zentol <chesnay@apache.org>
          Date: 2017-01-05T13:37:03Z

          FLINK-5380 Fix task metrics reuse for single-operator chains


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/3068 FLINK-5380 Fix task metrics reuse for single-operator chains This PR fixes the chain end detection in the `StreamingJobGraphGenerator`, which fixes the re-use of metrics for tasks consisting of only a single operator, which fixes the display in the WebInterface. I've also added a test to make sure that chain start and end are set correctly. Before the fix, `assertTrue(sourceConfig.isChainEnd());` would have failed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 5380_web_metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3068.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3068 commit bfd038d13b73816adb4dab1387b1f75ecf2f2bfc Author: zentol <chesnay@apache.org> Date: 2017-01-05T13:37:03Z FLINK-5380 Fix task metrics reuse for single-operator chains
          Hide
          sachingoel0101 Sachin Goel added a comment -

          No problem!

          Show
          sachingoel0101 Sachin Goel added a comment - No problem!
          Hide
          Zentol Chesnay Schepler added a comment -

          Sachin Goel Would you mind if i take this over?

          Show
          Zentol Chesnay Schepler added a comment - Sachin Goel Would you mind if i take this over?
          Hide
          Zentol Chesnay Schepler added a comment -

          This is a problem with detecting which operator's metrics should be used for a task.

          Consider a chain A -> B. Both operators A and B measure how many records they receive and send. Since the WebInterface however only views them as a single task we don't need all of these; we only need the input count from A and output from B; or in other words the input from the first operator and the output from the last.

          Which metrics are reused is governed by the isChainStart/isChainEnd properties of the StreamConfig, which are set in the in the StreamingJobGraphGenerator#createChain.

          The issue is simply that I did not account for the case where a chain only consists of a single operator; in which case we need both the in- and output metrics.

          In short, adding

          if (chainableOutputs.isEmpty()) {
          	config.setChainEnd();
          }
          

          at the end of the

          if (currentNodeId.equals(startNodeId)) {
          

          block should fix this issue.

          Show
          Zentol Chesnay Schepler added a comment - This is a problem with detecting which operator's metrics should be used for a task. Consider a chain A -> B. Both operators A and B measure how many records they receive and send. Since the WebInterface however only views them as a single task we don't need all of these; we only need the input count from A and output from B; or in other words the input from the first operator and the output from the last. Which metrics are reused is governed by the isChainStart/isChainEnd properties of the StreamConfig, which are set in the in the StreamingJobGraphGenerator#createChain. The issue is simply that I did not account for the case where a chain only consists of a single operator; in which case we need both the in- and output metrics. In short, adding if (chainableOutputs.isEmpty()) { config.setChainEnd(); } at the end of the if (currentNodeId.equals(startNodeId)) { block should fix this issue.

            People

            • Assignee:
              Zentol Chesnay Schepler
              Reporter:
              rmetzger Robert Metzger
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development