Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9286

[0.94] ageOfLastShippedOp replication metric doesn't update if the slave regionserver is stalled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.94.12
    • None
    • None
    • Reviewed

    Description

      In replicationmanager
      HRegionInterface rrs = getRS();
      rrs.replicateLogEntries(Arrays.copyOf(this.entriesArray, currentNbEntries));
      ....
      this.metrics.setAgeOfLastShippedOp(
      this.entriesArray[currentNbEntries-1].getKey().getWriteTime());
      break;

      which makes sense, but is wrong. The problem is that rrs.replicateLogEntries will block for a very long time if the slave server is suspended or unavailable but not down.

      However this is easy to fix. We just need to call refreshAgeOfLastShippedOp();
      on a regular basis, in a different thread. I've attached a patch which fixed this for cdh4. I can make one for trunk and the like as well if you need me to do but it's a small change.

      Attachments

        Issue Links

          Activity

            People

              posix4e Alex Newman
              posix4e Alex Newman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: