Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3092

Extend the FileChannel's monitoring metrics

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.0
    • Fix Version/s: 1.8.0
    • Component/s: File Channel
    • Labels:
      None

      Description

      There are already several generic metrics (e.g. eventPutAttemptCount and eventPutSuccessCount) which can be used to create compound metrics for monitoring the FileChannel's health.
      Some monitoring system's aren't capable to calculate such derived metrics, though, so I recommend to add the following extra counters to represent if a channel operation failed or the channel is in an unhealthy state.

      • eventPutErrorCount: incremented if an IOException occurs during put operation.
      • eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs during take operation.
      • checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
      • unhealthy: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem). This is similar to the already existing open flag except that the latter is initially false and is set to true if the initialization (including the log replay) is successfully done. The unhealthy, in contrary, is false by default and is set to true if there is an error during startup.

      Beside these flags I'd also introduce a closed flag which is the numeric representation (1: closed, 0: open) of the negated (already existing) open flag.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          UNSTABLE: Integrated in Jenkins build Flume-trunk-hbase-1 #248 (See https://builds.apache.org/job/Flume-trunk-hbase-1/248/)
          FLUME-3092. Extend the FileChannel's monitoring metrics (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=fdc53f338931b96addf06f3f2be59da128656ec0)

          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          • (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelBase.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/instrumentation/FileChannelCounterMBean.java
          • (add) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelErrorMetrics.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/instrumentation/FileChannelCounter.java
          • (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLog.java
          Show
          hudson Hudson added a comment - UNSTABLE: Integrated in Jenkins build Flume-trunk-hbase-1 #248 (See https://builds.apache.org/job/Flume-trunk-hbase-1/248/ ) FLUME-3092 . Extend the FileChannel's monitoring metrics (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=fdc53f338931b96addf06f3f2be59da128656ec0 ) (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelBase.java (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/instrumentation/FileChannelCounterMBean.java (add) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelErrorMetrics.java (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/instrumentation/FileChannelCounter.java (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLog.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flume/pull/131

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flume/pull/131
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fdc53f338931b96addf06f3f2be59da128656ec0 in flume's branch refs/heads/trunk from Denes Arvay
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=fdc53f3 ]

          FLUME-3092. Extend the FileChannel's monitoring metrics

          This patch adds the following new metrics to the FileChannel's counters:

          • eventPutErrorCount: incremented if an IOException occurs during put operation.
          • eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
            during take operation.
          • checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
          • unhealthy: this flag represents whether the channel has started successfully
            (i.e. the replay ran without any problem), so the channel is capable for normal operation
          • closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

          Closes #131.

          Reviewers: Attila Simon, Mike Percy

          (Denes Arvay via Mike Percy)

          Show
          jira-bot ASF subversion and git services added a comment - Commit fdc53f338931b96addf06f3f2be59da128656ec0 in flume's branch refs/heads/trunk from Denes Arvay [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=fdc53f3 ] FLUME-3092 . Extend the FileChannel's monitoring metrics This patch adds the following new metrics to the FileChannel's counters: eventPutErrorCount: incremented if an IOException occurs during put operation. eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs during take operation. checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write. unhealthy: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem), so the channel is capable for normal operation closed flag: the numeric representation (1: closed, 0: open) of the negated open flag. Closes #131. Reviewers: Attila Simon, Mike Percy (Denes Arvay via Mike Percy)
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user adenes opened a pull request:

          https://github.com/apache/flume/pull/131

          FLUME-3092. Extend the FileChannel's monitoring metrics

          This patch adds the following new metrics to the FileChannel's counters:

          • eventPutErrorCount: incremented if an IOException occurs during put operation.
          • eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
            during take operation.
          • checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
          • unhealthy: this flag represents whether the channel has started successfully
            (i.e. the replay ran without any problem), so the channel is capable for normal operation
          • closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/adenes/flume FLUME-3092

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flume/pull/131.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #131


          commit 7c5957e4692817482519e6b9da20d29324a7f332
          Author: Denes Arvay <denes@cloudera.com>
          Date: 2017-05-09T14:23:31Z

          FLUME-3092. Extend the FileChannel's monitoring metrics

          This patch adds the following new metrics to the FileChannel's counters:

          • eventPutErrorCount: incremented if an IOException occurs during put operation.
          • eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
            during take operation.
          • checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
          • unhealthy: this flag represents whether the channel has started successfully
            (i.e. the replay ran without any problem), so the channel is capable for normal operation
          • closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

          Show
          githubbot ASF GitHub Bot added a comment - GitHub user adenes opened a pull request: https://github.com/apache/flume/pull/131 FLUME-3092 . Extend the FileChannel's monitoring metrics This patch adds the following new metrics to the FileChannel's counters: eventPutErrorCount: incremented if an IOException occurs during put operation. eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs during take operation. checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write. unhealthy: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem), so the channel is capable for normal operation closed flag: the numeric representation (1: closed, 0: open) of the negated open flag. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adenes/flume FLUME-3092 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flume/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 commit 7c5957e4692817482519e6b9da20d29324a7f332 Author: Denes Arvay <denes@cloudera.com> Date: 2017-05-09T14:23:31Z FLUME-3092 . Extend the FileChannel's monitoring metrics This patch adds the following new metrics to the FileChannel's counters: eventPutErrorCount: incremented if an IOException occurs during put operation. eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs during take operation. checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write. unhealthy: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem), so the channel is capable for normal operation closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

            People

            • Assignee:
              denes Denes Arvay
              Reporter:
              denes Denes Arvay
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development