Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1506

Potential orphaned containers problem in SamzaContainer.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      We noticed an occurrence of orphaned container in LinkedIn production environment(using samza-yarn).

      The ContainerHeartbeatMonitor added as part of SAMZA-871 to solve this problem is alive on the orphaned container java process and didn't shut it down.

      ContainerHeartbeatMonitor uses single-threaded ScheduledExecutorService to periodically check if the container is orphaned.

      From the following process thread dump, it's apparent that the worker thread in ScheduledExecutorService finds the task queue is empty and goes to waiting state(expecting new tasks to be added to the queue).

      "Samza-ContainerHeartbeatMonitor-0" #34 prio=5 os_prio=0 tid=0x00007f9322896800 nid=0x38af waiting on condition [0x00007f92f363e000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x000000070078a0e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
              at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      If the execution of a Runnable submitted to ScheduledExecutorService.scheduleAtFixedRate throws an exception, subsequent executions are suppressed.

      Existing ContainerHeartBeatClient implementation which accesses the ApplicationMaster http-endpoint to get container liveness has IOException handlers alone. Any unchecked exceptions thrown from that code path will shutdown the ContainerHeartbeatMonitor(This is the suspected cause).

      This requires further investigation.

      Attachments

        Issue Links

          Activity

            People

              abkshvn Abhishek Shivanna
              spvenkat Shanthoosh Venkataraman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: