Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2933

Ability to monitor the jute.maxBuffer usage in real-time

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      This is related to jute.maxbuffer problems on the server side when Leader generates a proposal that doesn't fit into Follower's Jute buffer causing the quorum to be broken.

      Proposed solution is to add the following new JMX Beans:

      1. Add getJuteMaxBuffer to ZookeeperServerBean which monitors the current jute.maxbuffer setting,
      2. Add get last/min/max ProposalSize to LeaderBean which monitors the size of the latest/min/max proposal.

      The rationale behind this new feature is to add capability to JMX monitoring API to determine what is the current/min/max usage of the Jute buffer. This will let third party monitoring tools to get samples of buffer usage and create some statistics or generate alerts if it breaches a particular value.

      This will not solve the problems related to jute.maxbuffer setting on its own, but it's intended to be the first step towards better handling or preventing production issues to happen.

      Subtasks have been created to separately implement client and server side buffer size monitoring.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            andor Andor Molnar
            andor Andor Molnar
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h

                Slack

                  Issue deployment