Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1170

Deadlock in shutdown with DatasetLifecycleManager

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      During cancel of a test run, I observed this deadlock in the DatasetLifeCycleManager. It looks like the checkpoint thread is holding the optracker but needs the monitor on the DatasetLifeCycleManager, and the DatasetLifecycleManager needs the converse. This in turn, prevents clean shutdown.

      "org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996" daemon prio=5 tid=0x74 nid=NA waiting for monitor entry
      java.lang.Thread.State: BLOCKED
      blocks org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995
      waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995 to release lock on <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
      at org.apache.asterix.common.context.DatasetLifecycleManager.allocateDatasetMemory(DatasetLifecycleManager.java:639)
      at org.apache.asterix.common.context.PrimaryIndexOperationTracker.beforeOperation(PrimaryIndexOperationTracker.java:64)
      at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.enterComponents(LSMHarness.java:180)
      at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.getAndEnterComponents(LSMHarness.java:115)

      • locked <0x17dd> (a org.apache.asterix.common.context.PrimaryIndexOperationTracker)
        at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:333)
        at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:327)
        at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.insert(LSMTreeIndexAccessor.java:50)
        at org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:102)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:342)
        at org.apache.hyracks.control.nc.Task.run(Task.java:290)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

      "org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995" daemon prio=5 tid=0x77 nid=NA waiting for monitor entry
      java.lang.Thread.State: BLOCKED
      blocks Thread-55@5983
      blocks org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996
      waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996 to release lock on <0x17dd> (a org.apache.asterix.common.context.PrimaryIndexOperationTracker)
      at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:205)

      • locked <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
        at org.apache.hyracks.storage.am.common.dataflow.IndexDataflowHelper.open(IndexDataflowHelper.java:116)
        at org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.open(AsterixLSMInsertDeleteOperatorNodePushable.java:61)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:334)
        at org.apache.hyracks.control.nc.Task.run(Task.java:290)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

      "Thread-55@5983" prio=5 tid=0x5a nid=NA waiting for monitor entry
      java.lang.Thread.State: BLOCKED
      waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995 to release lock on <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
      at org.apache.asterix.common.context.DatasetLifecycleManager.flushAllDatasets(DatasetLifecycleManager.java:474)
      at org.apache.asterix.transaction.management.service.recovery.RecoveryManager.checkpoint(RecoveryManager.java:406)

      • locked <0x17f5> (a org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
        at org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.stop(NCApplicationEntryPoint.java:132)
        at org.apache.hyracks.control.nc.NodeControllerService.stop(NodeControllerService.java:347)
      • locked <0x17f7> (a org.apache.hyracks.control.nc.NodeControllerService)
        at org.apache.hyracks.control.nc.NodeControllerService$JVMShutdownHook.run(NodeControllerService.java:588)

      Attachments

        1. trace.txt
          64 kB
          Ian Maxon

        Issue Links

          Activity

            People

              mhubail Murtadha Makki Al Hubail
              imaxon Ian Maxon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: