Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-8986

Segment flush thread can remain in TIMED_WAITING state even when segment queue is empty

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.24.0, 1.26.0
    • 1.30.0
    • segment-azure
    • None
    • Patch, Important

    Description

      If thread is in interrupted state, during execution of SegmentWriteQueue. addToQueue InterruptedException will be thrown and wrapped in IOException.

      Right befire calling queue.offer, element is added to segmentsByUUID map, and never removed.
      Normally that happens in thread that reads from queue, and that invokes consume(SegmentWriteAction segment).

      Since item is not removed form the segmentsByUUID map, flusher thread will remain in TIMED_WAITING state.

      TarMK flush thread holds exclusivelly monitor needed by number of other threads, causing repository to be blocked.

      "TarMK flush [/opt/aem/launcher/repository/segmentstore-composite-global]" #82 daemon prio=5 os_prio=0 cpu=83628.24ms elapsed=291420.48s tid=0x00007fce902f3000 nid=0x1c2b in Object.wait()  [0x00007fce00aa5000]
         java.lang.Thread.State: TIMED_WAITING (on object monitor)
      	at java.lang.Object.wait(java.base@11.0.3/Native Method)
      	- waiting on <no object reference available>
      	at org.apache.jackrabbit.oak.segment.azure.queue.SegmentWriteQueue.flush(SegmentWriteQueue.java:183)
      	- waiting to re-lock in wait() <0x00000006b4911830> (a java.util.concurrent.ConcurrentHashMap)
      	at org.apache.jackrabbit.oak.segment.azure.AzureSegmentArchiveWriter.flush(AzureSegmentArchiveWriter.java:187)
      	at org.apache.jackrabbit.oak.segment.file.tar.TarWriter.flush(TarWriter.java:186)
      	- locked <0x00000006b4911960> (a java.lang.Object)
      	at org.apache.jackrabbit.oak.segment.file.tar.TarFiles.flush(TarFiles.java:535)
      	at org.apache.jackrabbit.oak.segment.file.FileStore.lambda$tryFlush$9(FileStore.java:359)
      	at org.apache.jackrabbit.oak.segment.file.FileStore$$Lambda$232/0x000000080067ac40.flush(Unknown Source)
      	at org.apache.jackrabbit.oak.segment.file.TarRevisions.doFlush(TarRevisions.java:236)
      	at org.apache.jackrabbit.oak.segment.file.TarRevisions.tryFlush(TarRevisions.java:216)
      	at org.apache.jackrabbit.oak.segment.file.FileStore.tryFlush(FileStore.java:357)
      	at org.apache.jackrabbit.oak.segment.file.FileStore.lambda$new$5(FileStore.java:212)
      	at org.apache.jackrabbit.oak.segment.file.FileStore$$Lambda$203/0x000000080064b440.run(Unknown Source)
      	at org.apache.jackrabbit.oak.segment.file.SafeRunnable.run(SafeRunnable.java:67)
      	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515)
      	at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.3/FutureTask.java:305)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.3/ScheduledThreadPoolExecutor.java:305)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628)
      	at java.lang.Thread.run(java.base@11.0.3/Thread.java:834)
      
      

      Here is the test case that demonstrates the problem. 

      test.patch

      Attachments

        1. test.patch
          4 kB
          Miroslav Smiljanic
        2. test_and_proposed_patch.patch
          4 kB
          Miroslav Smiljanic
        3. proposed_patch.patch
          0.8 kB
          Miroslav Smiljanic
        4. OAK-8986.patch
          4 kB
          Marcel Reutegger

        Activity

          People

            mreutegg Marcel Reutegger
            miroslav Miroslav Smiljanic
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: