Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1506

Potential orphaned containers problem in SamzaContainer.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      We noticed an occurrence of orphaned container in LinkedIn production environment(using samza-yarn).

      The ContainerHeartbeatMonitor added as part of SAMZA-871 to solve this problem is alive on the orphaned container java process and didn't shut it down.

      ContainerHeartbeatMonitor uses single-threaded ScheduledExecutorService to periodically check if the container is orphaned.

      From the following process thread dump, it's apparent that the worker thread in ScheduledExecutorService finds the task queue is empty and goes to waiting state(expecting new tasks to be added to the queue).

      "Samza-ContainerHeartbeatMonitor-0" #34 prio=5 os_prio=0 tid=0x00007f9322896800 nid=0x38af waiting on condition [0x00007f92f363e000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x000000070078a0e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
              at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      If the execution of a Runnable submitted to ScheduledExecutorService.scheduleAtFixedRate throws an exception, subsequent executions are suppressed.

      Existing ContainerHeartBeatClient implementation which accesses the ApplicationMaster http-endpoint to get container liveness has IOException handlers alone. Any unchecked exceptions thrown from that code path will shutdown the ContainerHeartbeatMonitor(This is the suspected cause).

      This requires further investigation.

        Attachments

          Activity

            People

            • Assignee:
              abkshvn Abhishek Shivanna
              Reporter:
              spvenkat Shanthoosh Venkataraman

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment