Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-8686

Tombstone removal optimization during GII could cause deadlock

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
    • Fix Version/s: 1.12.1, 1.13.1, 1.14.0
    • Component/s: None

      Description

      Similar to the issue described in GEODE-6526, if the condition in the below if statement in AbstractRegionMap.initialImagePut() evaluates to true, a call to AbstractRegionMap.removeTombstone() will be triggered that could lead to deadlock between the calling thread and a Tombstone GC thread calling TombstoneService.gcTombstones()

      if (owner.getServerProxy() == null && owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), entryVersion.getRegionVersion())) { 
        // the received tombstone has already been reaped, so don't retain it 
        if (owner.getIndexManager() != null) { 
          owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); 
        } 
        removeTombstone(oldRe, entryVersion, false, false); 
        return false; 
      } else { 
        owner.scheduleTombstone(oldRe, entryVersion); 
        lruEntryDestroy(oldRe); 
      }
      

      The proposed change is to remove this if statement and allow the old tombstone to be collected later by calling scheduleTombstone() in all cases. The call to AbstractRegionMap.removeTombstone() in AbstractRegionMap.initialImagePut() is intended to be an optimization to allow immediate removal of tombstones that we know have already been collected on other members, but since the conditions to trigger it are rare (the old entry must be a tombstone, the new entry received during GII must be a tombstone with a newer version, and we must have already collected a tombstone with a newer version than the new entry) and the overhead of scheduling a tombstone to be collected is comparatively low, the performance impact of removing this optimization in favour of simply scheduling the tombstone to be collected in all cases should be insignificant.

      The solution to the deadlock observed in GEODE-6526 was also to remove the call to AbstractRegionMap.removeTombstone() and allow the tombstone to be collected later and did not result in any unwanted behaviour, so the proposed fix should be similarly low-impact.

      Also of note is that with this proposed change, there will be no calls to AbstractRegionMap.removeTombstone() outside of the TombstoneService class, which should ensure that other deadlocks involving this method are not possible.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                donalevans Donal Evans
                Reporter:
                donalevans Donal Evans
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: