Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1170

Deadlock in shutdown with DatasetLifecycleManager

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      During cancel of a test run, I observed this deadlock in the DatasetLifeCycleManager. It looks like the checkpoint thread is holding the optracker but needs the monitor on the DatasetLifeCycleManager, and the DatasetLifecycleManager needs the converse. This in turn, prevents clean shutdown.

      "org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996" daemon prio=5 tid=0x74 nid=NA waiting for monitor entry
      java.lang.Thread.State: BLOCKED
      blocks org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995
      waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995 to release lock on <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
      at org.apache.asterix.common.context.DatasetLifecycleManager.allocateDatasetMemory(DatasetLifecycleManager.java:639)
      at org.apache.asterix.common.context.PrimaryIndexOperationTracker.beforeOperation(PrimaryIndexOperationTracker.java:64)
      at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.enterComponents(LSMHarness.java:180)
      at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.getAndEnterComponents(LSMHarness.java:115)

      • locked <0x17dd> (a org.apache.asterix.common.context.PrimaryIndexOperationTracker)
        at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:333)
        at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:327)
        at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.insert(LSMTreeIndexAccessor.java:50)
        at org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:102)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:342)
        at org.apache.hyracks.control.nc.Task.run(Task.java:290)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

      "org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995" daemon prio=5 tid=0x77 nid=NA waiting for monitor entry
      java.lang.Thread.State: BLOCKED
      blocks Thread-55@5983
      blocks org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996
      waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996 to release lock on <0x17dd> (a org.apache.asterix.common.context.PrimaryIndexOperationTracker)
      at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:205)

      • locked <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
        at org.apache.hyracks.storage.am.common.dataflow.IndexDataflowHelper.open(IndexDataflowHelper.java:116)
        at org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.open(AsterixLSMInsertDeleteOperatorNodePushable.java:61)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:334)
        at org.apache.hyracks.control.nc.Task.run(Task.java:290)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

      "Thread-55@5983" prio=5 tid=0x5a nid=NA waiting for monitor entry
      java.lang.Thread.State: BLOCKED
      waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995 to release lock on <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
      at org.apache.asterix.common.context.DatasetLifecycleManager.flushAllDatasets(DatasetLifecycleManager.java:474)
      at org.apache.asterix.transaction.management.service.recovery.RecoveryManager.checkpoint(RecoveryManager.java:406)

      • locked <0x17f5> (a org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
        at org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.stop(NCApplicationEntryPoint.java:132)
        at org.apache.hyracks.control.nc.NodeControllerService.stop(NodeControllerService.java:347)
      • locked <0x17f7> (a org.apache.hyracks.control.nc.NodeControllerService)
        at org.apache.hyracks.control.nc.NodeControllerService$JVMShutdownHook.run(NodeControllerService.java:588)

        Attachments

        1. trace.txt
          64 kB
          Ian Maxon

          Issue Links

            Activity

              People

              • Assignee:
                mhubail Murtadha Makki Al Hubail
                Reporter:
                imaxon Ian Maxon
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: