Uploaded image for project: 'MINA'
  1. MINA
  2. DIRMINA-779

SSLHandler can re-order data that it reads

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.7, 2.0.0-M1, 2.0.0-M2, 2.0.0-M3, 2.0.0-M4, 2.0.0-M5, 2.0.0-M6, 2.0.0-RC1
    • Fix Version/s: 2.0.8
    • Component/s: Filter
    • Labels:
      None

      Description

      The code in question is the flushScheduledEvents() method in SSLHandler.java:

       public void flushScheduledEvents() {
         // Fire events only when no lock is hold for this handler.
         if (Thread.holdsLock(this)) {
             return;
         }
      
         Event e;
               // We need synchronization here inevitably because filterWrite can be
         // called simultaneously and cause 'bad record MAC' integrity error.
         synchronized (this) {
             while ((e = filterWriteEventQueue.poll()) != null) {
                 e.nextFilter.filterWrite(session, (WriteRequest) e.data);
             }
         }
      
         while ((e = messageReceivedEventQueue.poll()) != null) {
             e.nextFilter.messageReceived(session, e.data);
         }
       }
      

      This method is called both by threads which handle writes, and threads that
      handle reads. Therefore, as the comments suggest, multiple threads may go
      through this code simultaneously. However, since there is no
      synchronization around processing of the messageReceivedEventQueue, it is
      possible that the received messages will be sent to the next filter out of
      order, should there be more than one message in the queue and a context
      switch happen at the wrong time.

      The bug would manifest in our application as a failure of our protocol layer
      to decode a message, we believe, due to a re-ordering. It only occurred in
      environments with a large amount of contention and network traffic and when
      using TLS. The fix I have tested was to move the closing brace of the
      synchronized block to extend to cover both while loops. I've attached a
      patch representing that change. Since making that change we have not
      encountered the bug again after about 30 hours of testing and 1.5 TB of
      traffic, whereas before the change we could reproduce it after a few
      minutes.

        Issue Links

          Activity

          Hide
          jresch@cleversafe.com Jason Resch added a comment -

          This diff was created from Mina 1.1.7. In testing it has alleviated the problem of protocol decoding errors.

          Show
          jresch@cleversafe.com Jason Resch added a comment - This diff was created from Mina 1.1.7. In testing it has alleviated the problem of protocol decoding errors.
          Hide
          elecharny Emmanuel Lecharny added a comment -

          Updated the release version

          Show
          elecharny Emmanuel Lecharny added a comment - Updated the release version
          Hide
          elecharny Emmanuel Lecharny added a comment -

          This is annoying. Englobing the two loops in the synchronized block makes testSSL fails (cf DIRMINA-650) in trunk.

          Does your problem occurs in MINA 2.0 ?

          Show
          elecharny Emmanuel Lecharny added a comment - This is annoying. Englobing the two loops in the synchronized block makes testSSL fails (cf DIRMINA-650 ) in trunk. Does your problem occurs in MINA 2.0 ?
          Hide
          elecharny Emmanuel Lecharny added a comment -

          Also a small test could help to find the problem and get it fixed.

          Show
          elecharny Emmanuel Lecharny added a comment - Also a small test could help to find the problem and get it fixed.
          Hide
          elecharny Emmanuel Lecharny added a comment -

          Postponed to 2.0.1

          Show
          elecharny Emmanuel Lecharny added a comment - Postponed to 2.0.1
          Hide
          elecharny Emmanuel Lecharny added a comment -

          Using a lock instead of a synchronized (this) does not break the ssltest, and should provide the right protection against concurrent access.

          Fixed with commit 56a6e58004ea4a6af5640f03d1f6796a073c5d62

          Show
          elecharny Emmanuel Lecharny added a comment - Using a lock instead of a synchronized (this) does not break the ssltest, and should provide the right protection against concurrent access. Fixed with commit 56a6e58004ea4a6af5640f03d1f6796a073c5d62

            People

            • Assignee:
              Unassigned
              Reporter:
              jresch@cleversafe.com Jason Resch
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development