Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9286

[0.94] ageOfLastShippedOp replication metric doesn't update if the slave regionserver is stalled

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.12
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In replicationmanager
      HRegionInterface rrs = getRS();
      rrs.replicateLogEntries(Arrays.copyOf(this.entriesArray, currentNbEntries));
      ....
      this.metrics.setAgeOfLastShippedOp(
      this.entriesArray[currentNbEntries-1].getKey().getWriteTime());
      break;

      which makes sense, but is wrong. The problem is that rrs.replicateLogEntries will block for a very long time if the slave server is suspended or unavailable but not down.

      However this is easy to fix. We just need to call refreshAgeOfLastShippedOp();
      on a regular basis, in a different thread. I've attached a patch which fixed this for cdh4. I can make one for trunk and the like as well if you need me to do but it's a small change.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                posix4e Alex Newman
                Reporter:
                posix4e Alex Newman
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: