Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25741

Deadlock during peer cleanup with NoNodeException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7.0
    • 1.7.0
    • Replication

    Description

      We have observed that replication source metrics for peer exists on some region servers even though peer has been removed.  This is because when we encounter the NoNodeException in ReplicationSource, it calls the `peerRemoved` workflow which should eventually terminate the source and removes the source from the source manager. Now, the problem is ReplicationSource thread terminates itself and thus the action to removePeer is not complete leaving the metrics there forever for source. This is the flow, replication source trying to clean wals here and on NoNodeException it calls the peerRemoved and terminate the source (itself), leaving the terminated source there in sourcemanager and not clearing it's metrics.

       

      Attachments

        Issue Links

          Activity

            People

              sandeep.pal Sandeep Pal
              sandeep.pal Sandeep Pal
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: