Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1506

Potential orphaned containers problem in SamzaContainer.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      We noticed an occurrence of orphaned container in LinkedIn production environment(using samza-yarn).

      The ContainerHeartbeatMonitor added as part of SAMZA-871 to solve this problem is alive on the orphaned container java process and didn't shut it down.

      ContainerHeartbeatMonitor uses single-threaded ScheduledExecutorService to periodically check if the container is orphaned.

      From the following process thread dump, it's apparent that the worker thread in ScheduledExecutorService finds the task queue is empty and goes to waiting state(expecting new tasks to be added to the queue).

      "Samza-ContainerHeartbeatMonitor-0" #34 prio=5 os_prio=0 tid=0x00007f9322896800 nid=0x38af waiting on condition [0x00007f92f363e000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x000000070078a0e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
              at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
              at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      If the execution of a Runnable submitted to ScheduledExecutorService.scheduleAtFixedRate throws an exception, subsequent executions are suppressed.

      Existing ContainerHeartBeatClient implementation which accesses the ApplicationMaster http-endpoint to get container liveness has IOException handlers alone. Any unchecked exceptions thrown from that code path will shutdown the ContainerHeartbeatMonitor(This is the suspected cause).

      This requires further investigation.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                abkshvn Abhishek Shivanna
                Reporter:
                spvenkat Shanthoosh Venkataraman
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: