Uploaded image for project: 'Brooklyn'
  1. Brooklyn
  2. BROOKLYN-214

OutOfMemoryError (too many threads): repeated calls to AttributeWhenReady

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • None

    Description

      When launching Clocker, an OutOfMemoryError was encountered due to too many threads. The underlying cause is repeated task execution to AttributeWhenReady, where each task blocks a thread.

      The exception encountered was:

      2016-01-11 16:36:32,460 DEBUG o.a.b.u.c.t.BasicExecutionManager [brooklyn-execmanager-vzwdtuv4-5490]: Exception running task Task[machine.loadAverage @ h2jAHTjo <- ssh[uptime->machine.loadAverage]:LBUslVfG] (rethrowing): unable to
       create new native thread
      java.lang.OutOfMemoryError: unable to create new native thread
      

      Shortly before the OOME, this was the resource usage:

      2016-01-11 16:36:26,884 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (after) - using 202 MB / 310 MB memory (122 kB soft); 1987 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapS
      ize=0}; tasks: 1835 active, 1040 unfinished; 1425 remembered, 169790 total submitted)
      

      Looking at a thread dump, there are 977 threads waiting for a lock on org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent$AttributeWhenReady}, e.g.

      "brooklyn-execmanager-vzwdtuv4-1859" #57280 daemon prio=5 os_prio=31 tid=0x00007fa0baef0000 nid=0xf307 waiting for monitor entry [0x0000700009780000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.brooklyn.camp.brooklyn.spi.dsl.BrooklynDslDeferredSupplier.get(BrooklynDslDeferredSupplier.java:93)
              - waiting to lock <0x0000000784bc2828> (a org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent$AttributeWhenReady)
              at org.apache.brooklyn.util.core.task.ValueResolver$2.call(ValueResolver.java:322)
              at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:342)
              at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:493)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      The one thread holding that lock is doing:

      "brooklyn-execmanager-vzwdtuv4-1864" #57290 daemon prio=5 os_prio=31 tid=0x00007fa0bbc19800 nid=0x76e7 waiting on condition [0x00007000061e1000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000007851a4cb8> (a java.util.concurrent.FutureTask)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
              at java.util.concurrent.FutureTask.get(FutureTask.java:191)
              at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
              at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:342)
              at org.apache.brooklyn.camp.brooklyn.spi.dsl.BrooklynDslDeferredSupplier.get(BrooklynDslDeferredSupplier.java:105)
              - locked <0x0000000784bc2828> (a org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent$AttributeWhenReady)
              at org.apache.brooklyn.util.core.task.ValueResolver$2.call(ValueResolver.java:322)
              at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:342)
              at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:493)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      Looking at the caller of org.apache.brooklyn.util.core.task.ValueResolver$2.call(ValueResolver.java:322), it's interesting to see that there are only two instances of that. This tells us that the other calls (to ValueResolver.getMaybeInternal()) must all have had a short timeout. Inside getMaybeInternal(), it waits for the given timeout for the resolved value, and then calls task.cancel(true) before returning.

      Given that the tasks' threads are waiting for a synchronized lock, they cannot be interrupted. One part of the fix is to change the implementation of BrooklynDslDeferredSupplier.get(BrooklynDslDeferredSupplier.java:93) to use a java.util.concurrent.lock that can be interrupted. However, it still feels unsafe (there could be other code that uses Java's synchronized).

      Looking at where this ValueResolver.timeout(Duration) is set could tell us where these 977ish calls came from. One place is the REST api in RestValueResolver.getImmediateValue. If the web-console were polling for the entity's config, that could explain it. Another place is in the org.apache.brooklyn.enricher.stock.Transformer enricher.

      This was encountered with 0.9.0-SNAPSHOT.

      Attachments

        Activity

          People

            Unassigned Unassigned
            aled.sage Aled Sage
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: