Uploaded image for project: 'MINA'
  1. MINA
  2. DIRMINA-898

A wakeup() is issued in a thread before the target select() is executed.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.4
    • Fix Version/s: 2.0.5
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Description

      Context:
      -------
      I am using BarchartUDT (BC for short in the calls traced below) as transport.
      I have integrated it with MINA release 2.0.4.

      BarchartUDT is a java wrapper to the native UDT library (C++). UDT is a reliable,
      connection-oriented, streaming protocol.

      The example below is the MINA sample timeserver that hangs in the first bind() call.

      Description:
      -----------
      Trigger : A wakeup() is issued in a thread before the target select() is executed
      in another thread, and it is lost.

      It is not clear if this is a bug. The MINA code in AbstractPollingIoAcceptor seems to rely
      on the fact that the wakeup issued before a select in not lost. But the fact is that the
      two threads are racing, and if the wakeup IS lost the select will remain blocked forever,
      as in the example below.

      Details:
      -------
      IN AbstractPollingIoAcceptor.bindInternal() the expected program order is :
      1. startupAcceptor() -> Acceptor.run() -> select() -> BC::performSelect...
      2. wakeup -> NioAcceptor.wakeup() -> selector.wakeup() -> BC:: wakeup

      At runtime, 2. and 1. above are reversed, so the wakeup is lost and 1. hangs as a result.

      CALL TRACE
      ---------
      Thread interleaving starting with AbstractPollingIoAcceptor.bind(Iterable):

      — AbstractIoAcceptor::bind(Iterable): calling bindInternal
      — bindInternal: calling startupAcceptor
      startupAcceptor: Starting NEW Acceptor
      — bindInternal: calling wakeup()
      :::BC : wakeup() sets wakeupStepCount to 1
      — bindInternal: before await
      Accepptor.run: calling select()
      :::BC : performSelect calling doSelectSocketUDT with wakeupBaseCount 0 wakeupStepCount 1

      THREAD DUMP
      -----------
      The 2 threads waiting (one in select, the other in await) :

      Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode):

      "NioUDTSocketAcceptor-1" prio=10 tid=0x00002aaab400d000 nid=0x5f8f runnable [0x0000000040f04000]
      java.lang.Thread.State: RUNNABLE
      at com.barchart.udt.SocketUDT.select1(Native Method)
      at com.barchart.udt.SocketUDT.select(SocketUDT.java:644)
      at com.barchart.udt.nio.SelectorUDT.doSelectSocketUDT(SelectorUDT.java:406)
      at com.barchart.udt.nio.SelectorUDT.performSelect(SelectorUDT.java:528)
      at com.barchart.udt.nio.SelectorUDT.doSelectReally(SelectorUDT.java:575)
      at com.barchart.udt.nio.SelectorUDT.doSelectInsideLock(SelectorUDT.java:333)

      • locked <0x00000000eb2d3ee0> (a com.barchart.udt.nio.HelperNIOUDT$1)
      • locked <0x00000000eb2d3ef0> (a java.util.Collections$UnmodifiableSet)
      • locked <0x00000000eb2d3e78> (a com.barchart.udt.nio.SelectorUDT)
        at com.barchart.udt.nio.SelectorUDT.select(SelectorUDT.java:191)
        at com.barchart.udt.nio.SelectorUDT.select(SelectorUDT.java:181)
        at org.apache.mina.transport.socket.nio.NioUDTSocketAcceptor.select(NioUDTSocketAcceptor.java:279)
        at org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.jav
        a:413)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

      "Low Memory Detector" daemon prio=10 tid=0x00000000454ca000 nid=0x5f8d runnable [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE

      "C2 CompilerThread1" daemon prio=10 tid=0x00000000454c7000 nid=0x5f8c waiting on condition [0x00000000000000
      00]
      java.lang.Thread.State: RUNNABLE

      "C2 CompilerThread0" daemon prio=10 tid=0x00000000454c2800 nid=0x5f8b waiting on condition [0x00000000000000
      00]
      java.lang.Thread.State: RUNNABLE

      "Signal Dispatcher" daemon prio=10 tid=0x00000000454c0000 nid=0x5f8a waiting on condition [0x000000000000000
      0]
      java.lang.Thread.State: RUNNABLE

      "Finalizer" daemon prio=10 tid=0x000000004549e000 nid=0x5f89 in Object.wait() [0x0000000041cf1000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000eb2d4220> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
      • locked <0x00000000eb2d4220> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

      "Reference Handler" daemon prio=10 tid=0x000000004549c000 nid=0x5f88 in Object.wait() [0x0000000041bf0000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000eb2d00b0> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
      • locked <0x00000000eb2d00b0> (a java.lang.ref.Reference$Lock)

      "main" prio=10 tid=0x0000000045440000 nid=0x5f86 in Object.wait() [0x0000000040644000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000eb2d42c0> (a org.apache.mina.core.service.AbstractIoAcceptor$AcceptorOperati
        onFuture)
        at org.apache.mina.core.future.DefaultIoFuture.await0(DefaultIoFuture.java:189)
      • locked <0x00000000eb2d42c0> (a org.apache.mina.core.service.AbstractIoAcceptor$AcceptorOperationFu
        ture)
        at org.apache.mina.core.future.DefaultIoFuture.awaitUninterruptibly(DefaultIoFuture.java:130)
        at org.apache.mina.core.polling.AbstractPollingIoAcceptor.bindInternal(AbstractPollingIoAcceptor.jav
        a:330)
        at org.apache.mina.core.service.AbstractIoAcceptor.bind(AbstractIoAcceptor.java:291)
      • locked <0x00000000eb2d4308> (a java.lang.Object)
        at org.apache.mina.core.service.AbstractIoAcceptor.bind(AbstractIoAcceptor.java:226)
        at MinaTimeServer.main(MinaTimeServer.java:86)

      "VM Thread" prio=10 tid=0x0000000045495000 nid=0x5f87 runnable

      "VM Periodic Task Thread" prio=10 tid=0x00000000454d5000 nid=0x5f8e waiting on condition

      JNI global references: 1193

      ------------------------------------------------------------------------------------------

      CODE - this is the code that causes it :

      protected final Set<SocketAddress> bindInternal(
      List<? extends SocketAddress> localAddresses) throws Exception {
      // Create a bind request as a Future operation. When the selector
      // have handled the registration, it will signal this future.
      AcceptorOperationFuture request = new AcceptorOperationFuture(
      localAddresses);

      // adds the Registration request to the queue for the Workers
      // to handle
      registerQueue.add(request);

      // creates the Acceptor instance and has the local
      // executor kick it off.
      System.out.println("--- bindInternal: calling startupAcceptor");
      startupAcceptor();

      // As we just started the acceptor, we have to unblock the select()
      // in order to process the bind request we just have added to the
      // registerQueue.
      System.out.println("--- bindInternal: calling wakeup()");
      wakeup();
      System.out.println("--- bindInternal: before await");
      // Now, we wait until this request is completed.
      request.awaitUninterruptibly();

      The implicit assumption here seems to be that either :

      1. The acceptor thread will run and call select() before this thread (the parent thread)
      calls wakep();
      OR
      2. The wakeup() is memorized and, even if the acceptor thread runs after the wakeup()
      was issued, its select() will be interrupted.

      Please let me know if this is a race bug, or if the code implicitly relies on #2 above.

      If not, then there should be some sort of synchronization between these 2 threads,
      to prevent the select in the acceptor being called before the wakeup in the parent thread.

      Related question: Are there other places in MINA that may exhibit this behavior ?

      Thanks.

      Florin

        Attachments

        1. build-output-jul16.txt
          23 kB
          Florin Sultan
        2. build-output-jul17.txt
          23 kB
          Florin Sultan
        3. build-test-failure-2.0.5.txt
          23 kB
          Florin Sultan
        4. StreamWriteFilterTest.output.txt
          22 kB
          Florin Sultan
        5. surefire-reports.tgz
          12 kB
          Florin Sultan
        6. surefire-reports-jul16.tgz
          12 kB
          Florin Sultan
        7. surefire-reports-jul17.tgz
          13 kB
          Florin Sultan

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              florinsultan Florin Sultan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: