Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-13358

Sometimes SedaConsumer does not start and remains stuck

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.19.0
    • 3.0.0
    • camel-core
    • None
    • Unknown

    Description

      Sometimes we notice that some of our routes get "stuck" and do not process anything.  Looking at the stacktrace for such a route we see something like this:

      "Camel (SomeName) thread #648 - seda://someName" #1287 daemon prio=5 os_prio=0 tid=0x00007f23880cf000 nid=0x25006 waiting on condition [0x00007f1f23530000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000006e4beb8e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
              at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
              at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:748)
      

      whereas we would instead normally expect to see something like this:

      "Camel (SomeName) thread #840 - seda://someName" #1577 daemon prio=5 os_prio=0 tid=0x00007f227c02b000 nid=0x256fe waiting on condition [0x00007f1f0ca20000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000006eb69c1f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
              at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
              at org.apache.camel.component.seda.SedaConsumer.doRun(SedaConsumer.java:198)
              at org.apache.camel.component.seda.SedaConsumer.run(SedaConsumer.java:154)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:748)
      

      Note that whereas this latter example has SedaConsumer in the stacktrace polling for an exchange, the former example has no such SedaConsumer in the stacktrace and is stuck waiting for a new "task" to be given to the ThreadPoolExecutor.

      After adding debug logging, I see that the issue is as follows:

      The code in ServiceSupport here writes to the starting and started flags using one thread (call it Thread W):

          public void start() throws Exception {
              if (isStarting() || isStarted()) {
                  // only start service if not already started
                  LOG.trace("Service already started");
                  return;
              }
              if (starting.compareAndSet(false, true)) {
                  LOG.trace("Starting service");
                  try {
                      doStart();
                      started.set(true);
                      starting.set(false);
      

      and on a different thread (call it thread R) this code in ServiceSupport reads the starting and started flags:

          public boolean isRunAllowed() {
              // if we have not yet initialized, then all options is false
              boolean unused1 = !started.get() && !starting.get() && !stopping.get() && !stopped.get();
              boolean unused2 = !suspending.get() && !suspended.get() && !shutdown.get() && !shuttingdown.get();
              if (unused1 && unused2) {
                  return false;
              }
              return !isStoppingOrStopped();
          }
      

      So the order of events is:

      1. Thread W: calls ServiceSupport.start()
      2. Thread W: compareAndSet starting from false to true
      3. Thread W: calls ServiceSupport.doStart() which eventually creates and starts Thread R
      4. Thread R: calls SedaConsumer.doRun() which calls isRunAllowed()
      5. Thread R: in isRunAllowed() reads started as false
      6. Thread W: set started to true
      7. Thread W: set starting to false
      8. Thread R: in isRunAllowed() reads starting as false
      9. Thread R: concludes that isRunAllowed is false
      10. Thread R: the while loop in SedaConsumer.doRun is not executed and the thread will forever remain stuck waiting for a new "task" (on ThreadPoolExecutor.java:1067)

      One fix would be to swap reading of starting and started so that starting is read first, then started.

      Looking at the latest code in Camel 3.0.x, this code has been refactored to use a single "status" field instead of multiple different AtomicBoolean fields. That should also fix this bug. However, for completeness, I thought it best to log this bug anyway.

      Attachments

        Activity

          People

            davsclaus Claus Ibsen
            graeme_moss Graeme Moss
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: