Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5626 Track and Address Flaky tests
  3. HDDS-9486

Deadlock between RocksDBCheckpointDiffer#pruneSstFiles and OMDBCheckpointServlet#getCheckpoint. Also causing intermittent fork timeout in TestSnapshotBackgroundServices.

    XMLWordPrintableJSON

Details

    Description

      Surefire fork for TestSnapshotBackgroundServices intermittently times out.

      CC hemantk, mladjangadzic

      "CompactionDagPruningService" 
         java.lang.Thread.State: WAITING
              at sun.misc.Unsafe.park(Native Method)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
              at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
              at org.apache.hadoop.ozone.lock.BootstrapStateHandler$Lock.lock(BootstrapStateHandler.java:31)
              at org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.pruneSstFiles(RocksDBCheckpointDiffer.java:1506)
              at org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer$$Lambda$573/124020389.run(Unknown Source)
      
      "qtp555959536-13964" 
         java.lang.Thread.State: BLOCKED
              at org.apache.hadoop.ozone.om.OMDBCheckpointServlet.getCheckpoint(OMDBCheckpointServlet.java:255)
              at org.apache.hadoop.hdds.utils.DBCheckpointServlet.generateSnapshotCheckpoint(DBCheckpointServlet.java:200)
              at org.apache.hadoop.hdds.utils.DBCheckpointServlet.doPost(DBCheckpointServlet.java:321)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:523)
      
       "main" 
         java.lang.Thread.State: BLOCKED
              at org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.close(RocksDBCheckpointDiffer.java:340)
              at org.apache.hadoop.hdds.utils.IOUtils.close(IOUtils.java:78)
              at org.apache.hadoop.hdds.utils.IOUtils.close(IOUtils.java:64)
              at org.apache.hadoop.hdds.utils.IOUtils.closeQuietly(IOUtils.java:92)
              at org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer$RocksDBCheckpointDifferHolder.invalidateCacheEntry(RocksDBCheckpointDiffer.java:1591)
              at org.apache.hadoop.hdds.utils.db.RDBStore.close(RDBStore.java:224)
              at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.stop(OmMetadataManagerImpl.java:753)
              at org.apache.hadoop.ozone.om.OzoneManager.stop(OzoneManager.java:2246)
              at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.stop(MiniOzoneHAClusterImpl.java:304)
              at org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:446)
              at org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices.shutdown(TestSnapshotBackgroundServices.java:199)
      

      Attachments

        1. 2023-09-07T11-48-29_820-jvmRun1.dump
          404 kB
          Attila Doroszlai
        2. 2023-09-14T11-32-20_981-jvmRun1.dump
          379 kB
          Attila Doroszlai

        Issue Links

          Activity

            People

              hemantk Hemant Kumar
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: