Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25881

Create a chore to update age related metrics.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Replication
    • None

    Description

      We had a case where logRoller and ReplicationShipper thread were stuck for a day since some other thread was holding the lock.

      We were not rolling the wal for 1 day and we were not shipping any edits for 1 day.
      Still the oldestWalAge and age of last ship metric were not spiking as they should.

      The way we calculate any age related metric is we calculate the diff between current time and the time at which any event happens and we add that to metrics Framework. We lose the event timestamp at that point.

      If the thread populating the metric is stuck then we will always carry forward the same value forever. This will make it look like there is no problem in the system. In this case the oldestWalAge metric was stuck at 809 value and age of last ship metric was 0 the whole time and no alert was fired.

      From Andrew Purtell,
      We have the Chore/ScheduledChore framework. We could be making more use of it. Much of this is legacy, before Chore was formalized as it is today.

      Attachments

        Issue Links

          Activity

            People

              rkrahul324 Rahul Kumar
              shahrs87 Rushabh Shah
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: