Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-10511

disco-event-worker can be deadlocked by BinaryContext.metadata running is sys striped pool waiting for cache entry lock

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None
    • Docs Required

    Description

      See attached thread dump:

      disco-event-worker hangs on removeExplicitNodeLocks() on GridCacheMapEntry which is held by GridDistributedTxRemoteAdapter acquired in GridCacheMapEntry.innerSet().

      CacheObjectBinaryProcessorImpl is waiting on metadata message on discovery, which can be processed due to disco-event-worker is stuck.

      Possible fix:

          public void onNodeLeft(final ClusterNode node) {
              if (isDone() || !enterBusy())
                  return;
      
              cctx.mvcc().removeExplicitNodeLocks(node.id(), initialVersion());
      
              try {
                  onDiscoveryEvent(new IgniteRunnable() {
                       @Override public void run() {
                           if (isDone() || !enterBusy())
                               return;
                           
                           ...
                       }
                   });
              }
              finally {
                  ...
              }
          }
      

      As we can see most of the processing is done async in IgniteRunnable() in exchange-worker.

       

      We can move 

          cctx.mvcc().removeExplicitNodeLocks(node.id(), initialVersion());
      

      inside this Runnable's body.

      Attachments

        1. race.txt
          12 kB
          Pavel Voronkin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              voropava Pavel Voronkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: